From slivkaje at gmail.com Sat Dec 1 12:05:06 2012 From: slivkaje at gmail.com (Jelena Slivka) Date: Sat, 1 Dec 2012 13:05:06 -0500 Subject: [petsc-users] Solving A*X = B where A and B are matrices Message-ID: Hello! I am trying to solve A*X = B where A and B are matrices, and then find trace of the resulting matrix X. My approach has been to partition matrix B in column vectors bi and then solve each system A*xi = bi. Then, for all vectors xi I would extract i-th element xi(i) and sum those elements in order to get Trace(X). Pseudo-code: 1) load matrices A and B 2) transpose matrix B (so that each right-hand side bi is in the row, as operation MatGetColumnVector is slow) 3) set up KSPSolve 4) create vector diagonal (in which xi(i) elements will be stored) 5) for each row i of matrix B owned by current process: - create vector bi by extracting row i from matrix B - apply KSPsolve to get xi - insert value xi(i) in diagonal vector (only the process which holds the ith value of vector x(i) should do so) 6) sum vector diagonal to get the trace. However, my code (attached, along with the test case) runs fine on one process, but hangs if started on multiple processes. Could you please help me figure out what am I doing wrong? Also, could you please tell me is it possible to use Cholesky factorization when running on multiple processes (I see that I cannot use it when I set the format of matrix A to MPIAIJ)? -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Experiment.c Type: text/x-csrc Size: 3789 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Abin Type: application/octet-stream Size: 136 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Bbin Type: application/octet-stream Size: 136 bytes Desc: not available URL: From bsmith at mcs.anl.gov Sat Dec 1 17:03:33 2012 From: bsmith at mcs.anl.gov (Barry Smith) Date: Sat, 1 Dec 2012 17:03:33 -0600 Subject: [petsc-users] Solving A*X = B where A and B are matrices In-Reply-To: References: Message-ID: <55DD94DF-150F-4917-AA26-C0680107E065@mcs.anl.gov> We recommend following the directions http://www.mcs.anl.gov/petsc/documentation/faq.html#schurcomplement for computing a Schur complement; just skip the unneeded step. MUMPS supports a parallel Cholesky but you can also use a parallel LU with MUMPS, PaSTIX or SuperLU_Dist and those will work fine also. With current software Cholesky in parallel is not tons better than LU so generally not worth monkeying with. Barry On Dec 1, 2012, at 12:05 PM, Jelena Slivka wrote: > Hello! > I am trying to solve A*X = B where A and B are matrices, and then find trace of the resulting matrix X. My approach has been to partition matrix B in column vectors bi and then solve each system A*xi = bi. Then, for all vectors xi I would extract i-th element xi(i) and sum those elements in order to get Trace(X). > Pseudo-code: > 1) load matrices A and B > 2) transpose matrix B (so that each right-hand side bi is in the row, as operation MatGetColumnVector is slow) > 3) set up KSPSolve > 4) create vector diagonal (in which xi(i) elements will be stored) > 5) for each row i of matrix B owned by current process: > - create vector bi by extracting row i from matrix B > - apply KSPsolve to get xi > - insert value xi(i) in diagonal vector (only the process which > holds the ith value of vector x(i) should do so) > 6) sum vector diagonal to get the trace. > However, my code (attached, along with the test case) runs fine on one process, but hangs if started on multiple processes. Could you please help me figure out what am I doing wrong? > Also, could you please tell me is it possible to use Cholesky factorization when running on multiple processes (I see that I cannot use it when I set the format of matrix A to MPIAIJ)? > > From w_ang_temp at 163.com Sun Dec 2 08:45:47 2012 From: w_ang_temp at 163.com (w_ang_temp) Date: Sun, 2 Dec 2012 22:45:47 +0800 (CST) Subject: [petsc-users] Is there something to be paid attention to about MatIsSymmetric? Message-ID: <7c30a630.9645.13b5c1487f1.Coremail.w_ang_temp@163.com> Hello, I use MatIsSymmetric to know if the matrix A is symmetric. According to my model, it should be symmetric due to the theory. But I always get the result 'PetscBool *flg = 0', although I set 'tol' a large value(0.001). Because the matrix is of 20000 dimension, I can not output the matrix to the txt. So I want to konw if there is something to be paid attention to about the function 'MatIsSymmetric' in version 3.2. Or do I have some other ways to determine the symmetry.I think symmetry is one of the most important thing in my analysis. Thanks. Jim -------------- next part -------------- An HTML attachment was scrubbed... URL: From jedbrown at mcs.anl.gov Sun Dec 2 09:10:51 2012 From: jedbrown at mcs.anl.gov (Jed Brown) Date: Sun, 2 Dec 2012 09:10:51 -0600 Subject: [petsc-users] Is there something to be paid attention to about MatIsSymmetric? In-Reply-To: <7c30a630.9645.13b5c1487f1.Coremail.w_ang_temp@163.com> References: <7c30a630.9645.13b5c1487f1.Coremail.w_ang_temp@163.com> Message-ID: The test for symmetry is not implemented for all matrix types. Looking at the code, it seems to only be SeqAIJ, but MatIsTranspose(A,A,...) would also work for MPIAIJ. On Sun, Dec 2, 2012 at 8:45 AM, w_ang_temp wrote: > Hello, > > I use MatIsSymmetric to know if the matrix A is symmetric. > > According to my model, it should be symmetric due to the theory. > > But I always get the result 'PetscBool *flg = 0', although I > > set 'tol' a large value(0.001). > > Because the matrix is of 20000 dimension, I can not output the > > matrix to the txt. So I want to konw if there is something to be paid > attention to > > about the function 'MatIsSymmetric' in version 3.2. Or do I have some > other ways > > to determine the symmetry.I think symmetry is one of the most important > thing > > in my analysis. > > Thanks. > > Jim > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From w_ang_temp at 163.com Sun Dec 2 12:09:06 2012 From: w_ang_temp at 163.com (w_ang_temp) Date: Mon, 3 Dec 2012 02:09:06 +0800 (CST) Subject: [petsc-users] Is there something to be paid attention to about MatIsSymmetric? In-Reply-To: References: <7c30a630.9645.13b5c1487f1.Coremail.w_ang_temp@163.com> Message-ID: <6113982e.243.13b5ccead36.Coremail.w_ang_temp@163.com> Maybe the matrix in my project is true unsymmetric. I use MatIsTranspose and get the same result. Maybe I need to check my constitutive model. >At 2012-12-02 23:10:51,"Jed Brown" wrote: >The test for symmetry is not implemented for all matrix types. Looking at the code, it seems to only be SeqAIJ, but MatIsTranspose(A,A,...) would also work >for MPIAIJ. >>On Sun, Dec 2, 2012 at 8:45 AM, w_ang_temp wrote: >>Hello, >> I use MatIsSymmetric to know if the matrix A is symmetric. >>According to my model, it should be symmetric due to the theory. >>But I always get the result 'PetscBool *flg = 0', although I >>set 'tol' a large value(0.001). >> Because the matrix is of 20000 dimension, I can not output the >>matrix to the txt. So I want to konw if there is something to be paid attention to >>about the function 'MatIsSymmetric' in version 3.2. Or do I have some other ways >>to determine the symmetry.I think symmetry is one of the most important thing >>in my analysis. >> Thanks. >> Jim -------------- next part -------------- An HTML attachment was scrubbed... URL: From jedbrown at mcs.anl.gov Sun Dec 2 12:18:23 2012 From: jedbrown at mcs.anl.gov (Jed Brown) Date: Sun, 2 Dec 2012 12:18:23 -0600 Subject: [petsc-users] Is there something to be paid attention to about MatIsSymmetric? In-Reply-To: <6113982e.243.13b5ccead36.Coremail.w_ang_temp@163.com> References: <7c30a630.9645.13b5c1487f1.Coremail.w_ang_temp@163.com> <6113982e.243.13b5ccead36.Coremail.w_ang_temp@163.com> Message-ID: Check boundary conditions. For debugging, do MatTranspose() followed by MatAXPY() to see the difference A - A^T. On Sun, Dec 2, 2012 at 12:09 PM, w_ang_temp wrote: > Maybe the matrix in my project is true unsymmetric. I use MatIsTranspose > and get > the same result. Maybe I need to check my constitutive model. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From agrayver at gfz-potsdam.de Mon Dec 3 06:37:30 2012 From: agrayver at gfz-potsdam.de (Alexander Grayver) Date: Mon, 03 Dec 2012 13:37:30 +0100 Subject: [petsc-users] valgrind complains about string functions Message-ID: <50BC9D0A.2040803@gfz-potsdam.de> Hello, I'm using PETSc-3.3-p4 compiled with ICC 12.0 + IntelMPI 4.0.3 and getting a bunch of the errors related to the string functions: ==22020== Conditional jump or move depends on uninitialised value(s) ==22020== at 0x4D3109: __intel_sse2_strcpy (in /home/main) ==22020== by 0xE87D51D: PetscStrcpy (in /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so) ==22020== by 0xE87B6A4: PetscStrallocpy (in /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so) ==22020== by 0xE796769: PetscFListGetPathAndFunction (in /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so) ==22020== by 0xE79652A: PetscFListAdd (in /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so) ==22020== by 0xE64ACB8: MatMFFDRegister (in /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so) ==22020== by 0xE64FA7D: MatMFFDRegisterAll (in /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so) ==22020== by 0xE64F65B: MatMFFDInitializePackage (in /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so) ==22020== by 0xE48D8C2: MatInitializePackage (in /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so) ==22020== by 0xE5157DB: MatCreate (in /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so) ==22020== by 0xE29A74C: MatCreateSeqAIJ (in /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so) Same thing for PetscStrncat etc. There was similar question two years ago in this mailing list and advice was to use a different compiler. It is not an option for me. Thus, my question is can those errors potentially cause any serious troubles? I came across with time trying to debug a weird segmentation fault. Thanks. -- Regards, Alexander From tim.gallagher at gatech.edu Mon Dec 3 07:47:45 2012 From: tim.gallagher at gatech.edu (Tim Gallagher) Date: Mon, 3 Dec 2012 08:47:45 -0500 (EST) Subject: [petsc-users] valgrind complains about string functions In-Reply-To: <50BC9D0A.2040803@gfz-potsdam.de> Message-ID: <1358153325.5949105.1354542465447.JavaMail.root@mail.gatech.edu> This is a known bug in Valgrind and aside from being annoying and making it darn near impossible to find real problems, there's nothing that can be done about it. Tim ----- Original Message ----- From: "Alexander Grayver" To: "PETSc users list" Sent: Monday, December 3, 2012 7:37:30 AM Subject: [petsc-users] valgrind complains about string functions Hello, I'm using PETSc-3.3-p4 compiled with ICC 12.0 + IntelMPI 4.0.3 and getting a bunch of the errors related to the string functions: ==22020== Conditional jump or move depends on uninitialised value(s) ==22020== at 0x4D3109: __intel_sse2_strcpy (in /home/main) ==22020== by 0xE87D51D: PetscStrcpy (in /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so) ==22020== by 0xE87B6A4: PetscStrallocpy (in /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so) ==22020== by 0xE796769: PetscFListGetPathAndFunction (in /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so) ==22020== by 0xE79652A: PetscFListAdd (in /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so) ==22020== by 0xE64ACB8: MatMFFDRegister (in /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so) ==22020== by 0xE64FA7D: MatMFFDRegisterAll (in /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so) ==22020== by 0xE64F65B: MatMFFDInitializePackage (in /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so) ==22020== by 0xE48D8C2: MatInitializePackage (in /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so) ==22020== by 0xE5157DB: MatCreate (in /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so) ==22020== by 0xE29A74C: MatCreateSeqAIJ (in /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so) Same thing for PetscStrncat etc. There was similar question two years ago in this mailing list and advice was to use a different compiler. It is not an option for me. Thus, my question is can those errors potentially cause any serious troubles? I came across with time trying to debug a weird segmentation fault. Thanks. -- Regards, Alexander From jedbrown at mcs.anl.gov Mon Dec 3 09:51:26 2012 From: jedbrown at mcs.anl.gov (Jed Brown) Date: Mon, 3 Dec 2012 07:51:26 -0800 Subject: [petsc-users] valgrind complains about string functions In-Reply-To: <1358153325.5949105.1354542465447.JavaMail.root@mail.gatech.edu> References: <50BC9D0A.2040803@gfz-potsdam.de> <1358153325.5949105.1354542465447.JavaMail.root@mail.gatech.edu> Message-ID: Specifically, Intel's vectorized string routines are reading partway into uninitialized memory, branching on the result, but doing so in a way that makes the result independent of what was there (assuming null-terminated string). You can make a Valgrind suppression for it. On Mon, Dec 3, 2012 at 5:47 AM, Tim Gallagher wrote: > This is a known bug in Valgrind and aside from being annoying and making > it darn near impossible to find real problems, there's nothing that can be > done about it. > > Tim > > ----- Original Message ----- > From: "Alexander Grayver" > To: "PETSc users list" > Sent: Monday, December 3, 2012 7:37:30 AM > Subject: [petsc-users] valgrind complains about string functions > > Hello, > > I'm using PETSc-3.3-p4 compiled with ICC 12.0 + IntelMPI 4.0.3 and > getting a bunch of the errors related to the string functions: > > ==22020== Conditional jump or move depends on uninitialised value(s) > ==22020== at 0x4D3109: __intel_sse2_strcpy (in /home/main) > ==22020== by 0xE87D51D: PetscStrcpy (in > > /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so) > ==22020== by 0xE87B6A4: PetscStrallocpy (in > > /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so) > ==22020== by 0xE796769: PetscFListGetPathAndFunction (in > > /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so) > ==22020== by 0xE79652A: PetscFListAdd (in > > /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so) > ==22020== by 0xE64ACB8: MatMFFDRegister (in > > /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so) > ==22020== by 0xE64FA7D: MatMFFDRegisterAll (in > > /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so) > ==22020== by 0xE64F65B: MatMFFDInitializePackage (in > > /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so) > ==22020== by 0xE48D8C2: MatInitializePackage (in > > /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so) > ==22020== by 0xE5157DB: MatCreate (in > > /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so) > ==22020== by 0xE29A74C: MatCreateSeqAIJ (in > > /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so) > > Same thing for PetscStrncat etc. > > There was similar question two years ago in this mailing list and advice > was to use a different compiler. It is not an option for me. > Thus, my question is can those errors potentially cause any serious > troubles? I came across with time trying to debug a weird segmentation > fault. > > Thanks. > > -- > Regards, > Alexander > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Mon Dec 3 10:53:28 2012 From: balay at mcs.anl.gov (Satish Balay) Date: Mon, 3 Dec 2012 10:53:28 -0600 (CST) Subject: [petsc-users] valgrind complains about string functions In-Reply-To: <50BC9D0A.2040803@gfz-potsdam.de> References: <50BC9D0A.2040803@gfz-potsdam.de> Message-ID: On Mon, 3 Dec 2012, Alexander Grayver wrote: > Hello, > > I'm using PETSc-3.3-p4 compiled with ICC 12.0 + IntelMPI 4.0.3 and getting a > bunch of the errors related to the string functions: > > ==22020== Conditional jump or move depends on uninitialised value(s) > ==22020== at 0x4D3109: __intel_sse2_strcpy (in /home/main) > ==22020== by 0xE87D51D: PetscStrcpy (in > /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so) > ==22020== by 0xE87B6A4: PetscStrallocpy (in > /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so) > ==22020== by 0xE796769: PetscFListGetPathAndFunction (in > /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so) > ==22020== by 0xE79652A: PetscFListAdd (in > /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so) > ==22020== by 0xE64ACB8: MatMFFDRegister (in > /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so) > ==22020== by 0xE64FA7D: MatMFFDRegisterAll (in > /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so) > ==22020== by 0xE64F65B: MatMFFDInitializePackage (in > /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so) > ==22020== by 0xE48D8C2: MatInitializePackage (in > /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so) > ==22020== by 0xE5157DB: MatCreate (in > /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so) > ==22020== by 0xE29A74C: MatCreateSeqAIJ (in > /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so) > > Same thing for PetscStrncat etc. > > There was similar question two years ago in this mailing list and advice was > to use a different compiler. It is not an option for me. You can always use a separate build of PETSc with gcc,--download-mpich to get a valgrind clean build [for debugging purposes] > Thus, my question is can those errors potentially cause any serious troubles? Generally we can ignore issues valgrind finds in system/compiler libraries. [Jed has a valid explanation for this one]. And generally valgrind provides 'default suppression files' for known glibc versions. But for such issues as with ifc, you can ask valgrind to create a supression file - and then rerun valgrind with this custom supression file - to get more readable output. Satish > I came across with time trying to debug a weird segmentation fault. > > Thanks. > > From fande.kong at colorado.edu Mon Dec 3 12:38:18 2012 From: fande.kong at colorado.edu (Fande Kong) Date: Mon, 3 Dec 2012 11:38:18 -0700 Subject: [petsc-users] Can anyone guess the possible reason of the following errors? Message-ID: Hi all, Can anyone guess the possible reason of the following errors: [0]PETSC ERROR: PetscGatherMessageLengths() line 133 in src/sys/utils/mpimesg.c [0]PETSC ERROR: VecScatterCreate_PtoP() line 2188 in src/vec/vec/utils/vpscat.c [0]PETSC ERROR: VecScatterCreate() line 1431 in src/vec/vec/utils/vscat.c I have been working for several days to figure out the reason, but now I still get nothing. I use Petsc-3.3-p3 based on the mvapich2-1.6. I tried to use vecscatter to distribute the mesh. When the mesh was small, everything was ok. But when the mesh became larger about 14,000,000 elements, I got the above errors. -- Fande Kong Department of Computer Science University of Colorado at Boulder -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Dec 3 12:41:16 2012 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 3 Dec 2012 12:41:16 -0600 Subject: [petsc-users] Can anyone guess the possible reason of the following errors? In-Reply-To: References: Message-ID: On Mon, Dec 3, 2012 at 12:38 PM, Fande Kong wrote: > Hi all, > > Can anyone guess the possible reason of the following errors: > > > [0]PETSC ERROR: PetscGatherMessageLengths() line 133 in > src/sys/utils/mpimesg.c > [0]PETSC ERROR: VecScatterCreate_PtoP() line 2188 in > src/vec/vec/utils/vpscat.c > [0]PETSC ERROR: VecScatterCreate() line 1431 in src/vec/vec/utils/vscat.c Partial error messages are generally not helpful. Matt > I have been working for several days to figure out the reason, but now I > still get nothing. I use Petsc-3.3-p3 based on the mvapich2-1.6. I tried to > use vecscatter to distribute the mesh. When the mesh was small, everything > was ok. But when the mesh became larger about 14,000,000 elements, I got the > above errors. > > -- > Fande Kong > Department of Computer Science > University of Colorado at Boulder > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener From slivkaje at gmail.com Mon Dec 3 13:08:54 2012 From: slivkaje at gmail.com (Jelena Slivka) Date: Mon, 3 Dec 2012 14:08:54 -0500 Subject: [petsc-users] Solving A*X = B where A and B are matrices In-Reply-To: <55DD94DF-150F-4917-AA26-C0680107E065@mcs.anl.gov> References: <55DD94DF-150F-4917-AA26-C0680107E065@mcs.anl.gov> Message-ID: Thank you very much! However, I have another question. I have a cluster of 4 nodes and each node has 6 cores. If I run my code using 6 cores on one node (using the command "mpiexec -n 6") it is much faster than running it on just one process (which is expected). However, if I try running the code on multiple nodes (using "mpiexec -f machinefile -ppn 4", where machinefile is the file which contains the node names), it runs much slower than on just one process. This also happens with tutorial examples. I have checked the number of iteration for KSP solver when spread on multiple processors and it doesn't seem to be the problem. Do you have any suggestions on what am I doing wrong? Are the commands I am using wrong? On Sat, Dec 1, 2012 at 6:03 PM, Barry Smith wrote: > > We recommend following the directions > http://www.mcs.anl.gov/petsc/documentation/faq.html#schurcomplement for > computing a Schur complement; just skip the unneeded step. MUMPS supports a > parallel Cholesky but you can also use a parallel LU with MUMPS, PaSTIX or > SuperLU_Dist and those will work fine also. With current software Cholesky > in parallel is not tons better than LU so generally not worth monkeying > with. > > Barry > > > On Dec 1, 2012, at 12:05 PM, Jelena Slivka wrote: > > > Hello! > > I am trying to solve A*X = B where A and B are matrices, and then find > trace of the resulting matrix X. My approach has been to partition matrix B > in column vectors bi and then solve each system A*xi = bi. Then, for all > vectors xi I would extract i-th element xi(i) and sum those elements in > order to get Trace(X). > > Pseudo-code: > > 1) load matrices A and B > > 2) transpose matrix B (so that each right-hand side bi is in the row, as > operation MatGetColumnVector is slow) > > 3) set up KSPSolve > > 4) create vector diagonal (in which xi(i) elements will be stored) > > 5) for each row i of matrix B owned by current process: > > - create vector bi by extracting row i from matrix B > > - apply KSPsolve to get xi > > - insert value xi(i) in diagonal vector (only the process which > > holds the ith value of vector x(i) should do so) > > 6) sum vector diagonal to get the trace. > > However, my code (attached, along with the test case) runs fine on one > process, but hangs if started on multiple processes. Could you please help > me figure out what am I doing wrong? > > Also, could you please tell me is it possible to use Cholesky > factorization when running on multiple processes (I see that I cannot use > it when I set the format of matrix A to MPIAIJ)? > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fd.kong at siat.ac.cn Mon Dec 3 13:12:53 2012 From: fd.kong at siat.ac.cn (Fande Kong) Date: Mon, 3 Dec 2012 12:12:53 -0700 Subject: [petsc-users] Can anyone guess the possible reason of the following errors? In-Reply-To: References: Message-ID: More details for the errors: [0]PETSC ERROR: PetscGatherMessageLengths() line 133 in src/sys/utils/mpimesg.c [0]PETSC ERROR: VecScatterCreate_PtoP() line 2188 in src/vec/vec/utils/vpscat.c [0]PETSC ERROR: VecScatterCreate() line 1431 in src/vec/vec/utils/vscat.c [0]PETSC ERROR: SpmcsSFCreateVecScatter() line 96 in SpmcsSFComm.cpp [0]PETSC ERROR: moveDataBetweenRootsAndLeaves() line 133 in SpmcsSFComm.cpp [0]PETSC ERROR: SpmcsSFCreateNormalizedEmbeddedSF() line 359 in SpmcsSFComm.cpp [0]PETSC ERROR: SpmcsSFDistributeSection() line 343 in SpmcsSection.cpp [0]PETSC ERROR: SpmcsMeshDistribute() line 444 in distributeMesh.cpp [0]PETSC ERROR: DMmeshInitialize() line 32 in mgInitialize.cpp [0]PETSC ERROR: main() line 64 in linearElasticity3d.cpp application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0 ===================================================================================== = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES = EXIT CODE: 256 = CLEANING UP REMAINING PROCESSES = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES ===================================================================================== [proxy:0:1 at node1778] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed [proxy:0:1 at node1778] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status [proxy:0:1 at node1778] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event [proxy:0:2 at node1777] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed [proxy:0:2 at node1777] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status [proxy:0:2 at node1777] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event [proxy:0:3 at node1773] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed [proxy:0:3 at node1773] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status [proxy:0:3 at node1773] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event [proxy:0:4 at node1770] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed [proxy:0:4 at node1770] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status [proxy:0:4 at node1770] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event [proxy:0:6 at node1760] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed [proxy:0:6 at node1760] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status [proxy:0:6 at node1760] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event [proxy:0:7 at node1758] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed [proxy:0:7 at node1758] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status [proxy:0:7 at node1758] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event [proxy:0:8 at node1738] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed [proxy:0:8 at node1738] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status [proxy:0:8 at node1738] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event [proxy:0:9 at node1736] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed [proxy:0:9 at node1736] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status [proxy:0:9 at node1736] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event [proxy:0:10 at node1668] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed [proxy:0:10 at node1668] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status [proxy:0:10 at node1668] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event [proxy:0:11 at node1667] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed [proxy:0:11 at node1667] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status [proxy:0:11 at node1667] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event [proxy:0:12 at node1658] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed [proxy:0:12 at node1658] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status [proxy:0:12 at node1658] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event [proxy:0:13 at node1656] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed [proxy:0:13 at node1656] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status [proxy:0:13 at node1656] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event [proxy:0:14 at node1637] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed [proxy:0:14 at node1637] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status [proxy:0:14 at node1637] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event [proxy:0:15 at node1636] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed [proxy:0:15 at node1636] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status [proxy:0:15 at node1636] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event [proxy:0:16 at node1611] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed [proxy:0:16 at node1611] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status [proxy:0:16 at node1611] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event [proxy:0:17 at node1380] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed [proxy:0:17 at node1380] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status [proxy:0:17 at node1380] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event [proxy:0:18 at node1379] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed [proxy:0:18 at node1379] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status [proxy:0:18 at node1379] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event [proxy:0:19 at node1378] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed [proxy:0:19 at node1378] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status [proxy:0:19 at node1378] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event [proxy:0:20 at node1377] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed [proxy:0:20 at node1377] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status [proxy:0:20 at node1377] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event [proxy:0:21 at node1376] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed [proxy:0:21 at node1376] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status [proxy:0:21 at node1376] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event [proxy:0:22 at node1375] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed [proxy:0:22 at node1375] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status [proxy:0:22 at node1375] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event [proxy:0:23 at node1374] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed [proxy:0:23 at node1374] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status [proxy:0:23 at node1374] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event [proxy:0:24 at node1373] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed [proxy:0:24 at node1373] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status [proxy:0:24 at node1373] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event [proxy:0:25 at node1372] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed [proxy:0:25 at node1372] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status [proxy:0:25 at node1372] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event [proxy:0:26 at node1371] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed [proxy:0:26 at node1371] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status [proxy:0:26 at node1371] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event [proxy:0:27 at node1370] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed [proxy:0:27 at node1370] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status [proxy:0:27 at node1370] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event [proxy:0:28 at node1369] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed [proxy:0:28 at node1369] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status [proxy:0:28 at node1369] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event [proxy:0:29 at node1368] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed [proxy:0:29 at node1368] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status [proxy:0:29 at node1368] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event [proxy:0:30 at node1367] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed [proxy:0:30 at node1367] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status [proxy:0:30 at node1367] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event [proxy:0:31 at node1366] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed [proxy:0:31 at node1366] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status [proxy:0:31 at node1366] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event [mpiexec at node1780] HYDT_bscu_wait_for_completion (./tools/bootstrap/utils/bscu_wait.c:70): one of the processes terminated badly; aborting [mpiexec at node1780] HYDT_bsci_wait_for_completion (./tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for completion [mpiexec at node1780] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:199): launcher returned error waiting for completion [mpiexec at node1780] main (./ui/mpich/mpiexec.c:385): process manager error waiting for completion It seems nothing. On Mon, Dec 3, 2012 at 11:41 AM, Matthew Knepley wrote: > On Mon, Dec 3, 2012 at 12:38 PM, Fande Kong > wrote: > > Hi all, > > > > Can anyone guess the possible reason of the following errors: > > > > > > [0]PETSC ERROR: PetscGatherMessageLengths() line 133 in > > src/sys/utils/mpimesg.c > > [0]PETSC ERROR: VecScatterCreate_PtoP() line 2188 in > > src/vec/vec/utils/vpscat.c > > [0]PETSC ERROR: VecScatterCreate() line 1431 in src/vec/vec/utils/vscat.c > > Partial error messages are generally not helpful. > > Matt > > > I have been working for several days to figure out the reason, but now I > > still get nothing. I use Petsc-3.3-p3 based on the mvapich2-1.6. I > tried to > > use vecscatter to distribute the mesh. When the mesh was small, > everything > > was ok. But when the mesh became larger about 14,000,000 elements, I got > the > > above errors. > > > > -- > > Fande Kong > > Department of Computer Science > > University of Colorado at Boulder > > > > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener > > -- Fande Kong ShenZhen Institutes of Advanced Technology Chinese Academy of Sciences -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Mon Dec 3 13:19:12 2012 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 3 Dec 2012 13:19:12 -0600 Subject: [petsc-users] Can anyone guess the possible reason of the following errors? In-Reply-To: References: Message-ID: <3416FED3-493A-42E6-83BB-EB661E69A90B@mcs.anl.gov> Perhaps some bad data is being passed into VecScatterCreate(). I would suggest having SpmcsSFCreateVecScatter validate the IS's and Vecs being passed in. For example, do the IS have tons of duplicates, how long are they etc? Barry On Dec 3, 2012, at 1:12 PM, Fande Kong wrote: > More details for the errors: > > [0]PETSC ERROR: PetscGatherMessageLengths() line 133 in src/sys/utils/mpimesg.c > [0]PETSC ERROR: VecScatterCreate_PtoP() line 2188 in src/vec/vec/utils/vpscat.c > [0]PETSC ERROR: VecScatterCreate() line 1431 in src/vec/vec/utils/vscat.c > [0]PETSC ERROR: SpmcsSFCreateVecScatter() line 96 in SpmcsSFComm.cpp > [0]PETSC ERROR: moveDataBetweenRootsAndLeaves() line 133 in SpmcsSFComm.cpp > [0]PETSC ERROR: SpmcsSFCreateNormalizedEmbeddedSF() line 359 in SpmcsSFComm.cpp > [0]PETSC ERROR: SpmcsSFDistributeSection() line 343 in SpmcsSection.cpp > [0]PETSC ERROR: SpmcsMeshDistribute() line 444 in distributeMesh.cpp > [0]PETSC ERROR: DMmeshInitialize() line 32 in mgInitialize.cpp > [0]PETSC ERROR: main() line 64 in linearElasticity3d.cpp > application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0 > > ===================================================================================== > = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES > = EXIT CODE: 256 > = CLEANING UP REMAINING PROCESSES > = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES > ===================================================================================== > [proxy:0:1 at node1778] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > [proxy:0:1 at node1778] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status > [proxy:0:1 at node1778] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event > [proxy:0:2 at node1777] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > [proxy:0:2 at node1777] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status > [proxy:0:2 at node1777] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event > [proxy:0:3 at node1773] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > [proxy:0:3 at node1773] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status > [proxy:0:3 at node1773] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event > [proxy:0:4 at node1770] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > [proxy:0:4 at node1770] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status > [proxy:0:4 at node1770] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event > [proxy:0:6 at node1760] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > [proxy:0:6 at node1760] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status > [proxy:0:6 at node1760] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event > [proxy:0:7 at node1758] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > [proxy:0:7 at node1758] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status > [proxy:0:7 at node1758] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event > [proxy:0:8 at node1738] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > [proxy:0:8 at node1738] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status > [proxy:0:8 at node1738] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event > [proxy:0:9 at node1736] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > [proxy:0:9 at node1736] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status > [proxy:0:9 at node1736] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event > [proxy:0:10 at node1668] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > [proxy:0:10 at node1668] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status > [proxy:0:10 at node1668] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event > [proxy:0:11 at node1667] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > [proxy:0:11 at node1667] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status > [proxy:0:11 at node1667] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event > [proxy:0:12 at node1658] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > [proxy:0:12 at node1658] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status > [proxy:0:12 at node1658] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event > [proxy:0:13 at node1656] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > [proxy:0:13 at node1656] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status > [proxy:0:13 at node1656] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event > [proxy:0:14 at node1637] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > [proxy:0:14 at node1637] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status > [proxy:0:14 at node1637] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event > [proxy:0:15 at node1636] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > [proxy:0:15 at node1636] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status > [proxy:0:15 at node1636] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event > [proxy:0:16 at node1611] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > [proxy:0:16 at node1611] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status > [proxy:0:16 at node1611] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event > [proxy:0:17 at node1380] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > [proxy:0:17 at node1380] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status > [proxy:0:17 at node1380] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event > [proxy:0:18 at node1379] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > [proxy:0:18 at node1379] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status > [proxy:0:18 at node1379] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event > [proxy:0:19 at node1378] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > [proxy:0:19 at node1378] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status > [proxy:0:19 at node1378] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event > [proxy:0:20 at node1377] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > [proxy:0:20 at node1377] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status > [proxy:0:20 at node1377] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event > [proxy:0:21 at node1376] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > [proxy:0:21 at node1376] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status > [proxy:0:21 at node1376] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event > [proxy:0:22 at node1375] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > [proxy:0:22 at node1375] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status > [proxy:0:22 at node1375] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event > [proxy:0:23 at node1374] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > [proxy:0:23 at node1374] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status > [proxy:0:23 at node1374] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event > [proxy:0:24 at node1373] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > [proxy:0:24 at node1373] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status > [proxy:0:24 at node1373] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event > [proxy:0:25 at node1372] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > [proxy:0:25 at node1372] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status > [proxy:0:25 at node1372] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event > [proxy:0:26 at node1371] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > [proxy:0:26 at node1371] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status > [proxy:0:26 at node1371] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event > [proxy:0:27 at node1370] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > [proxy:0:27 at node1370] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status > [proxy:0:27 at node1370] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event > [proxy:0:28 at node1369] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > [proxy:0:28 at node1369] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status > [proxy:0:28 at node1369] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event > [proxy:0:29 at node1368] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > [proxy:0:29 at node1368] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status > [proxy:0:29 at node1368] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event > [proxy:0:30 at node1367] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > [proxy:0:30 at node1367] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status > [proxy:0:30 at node1367] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event > [proxy:0:31 at node1366] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > [proxy:0:31 at node1366] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status > [proxy:0:31 at node1366] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event > [mpiexec at node1780] HYDT_bscu_wait_for_completion (./tools/bootstrap/utils/bscu_wait.c:70): one of the processes terminated badly; aborting > [mpiexec at node1780] HYDT_bsci_wait_for_completion (./tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for completion > [mpiexec at node1780] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:199): launcher returned error waiting for completion > [mpiexec at node1780] main (./ui/mpich/mpiexec.c:385): process manager error waiting for completion > > It seems nothing. > > On Mon, Dec 3, 2012 at 11:41 AM, Matthew Knepley wrote: > On Mon, Dec 3, 2012 at 12:38 PM, Fande Kong wrote: > > Hi all, > > > > Can anyone guess the possible reason of the following errors: > > > > > > [0]PETSC ERROR: PetscGatherMessageLengths() line 133 in > > src/sys/utils/mpimesg.c > > [0]PETSC ERROR: VecScatterCreate_PtoP() line 2188 in > > src/vec/vec/utils/vpscat.c > > [0]PETSC ERROR: VecScatterCreate() line 1431 in src/vec/vec/utils/vscat.c > > Partial error messages are generally not helpful. > > Matt > > > I have been working for several days to figure out the reason, but now I > > still get nothing. I use Petsc-3.3-p3 based on the mvapich2-1.6. I tried to > > use vecscatter to distribute the mesh. When the mesh was small, everything > > was ok. But when the mesh became larger about 14,000,000 elements, I got the > > above errors. > > > > -- > > Fande Kong > > Department of Computer Science > > University of Colorado at Boulder > > > > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener > > > > > -- > Fande Kong > ShenZhen Institutes of Advanced Technology > Chinese Academy of Sciences > From knepley at gmail.com Mon Dec 3 13:20:54 2012 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 3 Dec 2012 13:20:54 -0600 Subject: [petsc-users] Solving A*X = B where A and B are matrices In-Reply-To: References: <55DD94DF-150F-4917-AA26-C0680107E065@mcs.anl.gov> Message-ID: On Mon, Dec 3, 2012 at 1:08 PM, Jelena Slivka wrote: > Thank you very much! > However, I have another question. I have a cluster of 4 nodes and each node > has 6 cores. If I run my code using 6 cores on one node (using the command > "mpiexec -n 6") it is much faster than running it on just one process (which > is expected). However, if I try running the code on multiple nodes (using > "mpiexec -f machinefile -ppn 4", where machinefile is the file which > contains the node names), it runs much slower than on just one process. This > also happens with tutorial examples. I have checked the number of iteration > for KSP solver when spread on multiple processors and it doesn't seem to be > the problem. Do you have any suggestions on what am I doing wrong? Are the > commands I am using wrong? Most operations are memory bandwidth limited, and it sounds like the memory bandwidth for your cluster is maxed out by 1-2 procs. Matt > On Sat, Dec 1, 2012 at 6:03 PM, Barry Smith wrote: >> >> >> We recommend following the directions >> http://www.mcs.anl.gov/petsc/documentation/faq.html#schurcomplement for >> computing a Schur complement; just skip the unneeded step. MUMPS supports a >> parallel Cholesky but you can also use a parallel LU with MUMPS, PaSTIX or >> SuperLU_Dist and those will work fine also. With current software Cholesky >> in parallel is not tons better than LU so generally not worth monkeying >> with. >> >> Barry >> >> >> On Dec 1, 2012, at 12:05 PM, Jelena Slivka wrote: >> >> > Hello! >> > I am trying to solve A*X = B where A and B are matrices, and then find >> > trace of the resulting matrix X. My approach has been to partition matrix B >> > in column vectors bi and then solve each system A*xi = bi. Then, for all >> > vectors xi I would extract i-th element xi(i) and sum those elements in >> > order to get Trace(X). >> > Pseudo-code: >> > 1) load matrices A and B >> > 2) transpose matrix B (so that each right-hand side bi is in the row, as >> > operation MatGetColumnVector is slow) >> > 3) set up KSPSolve >> > 4) create vector diagonal (in which xi(i) elements will be stored) >> > 5) for each row i of matrix B owned by current process: >> > - create vector bi by extracting row i from matrix B >> > - apply KSPsolve to get xi >> > - insert value xi(i) in diagonal vector (only the process >> > which >> > holds the ith value of vector x(i) should do so) >> > 6) sum vector diagonal to get the trace. >> > However, my code (attached, along with the test case) runs fine on one >> > process, but hangs if started on multiple processes. Could you please help >> > me figure out what am I doing wrong? >> > Also, could you please tell me is it possible to use Cholesky >> > factorization when running on multiple processes (I see that I cannot use it >> > when I set the format of matrix A to MPIAIJ)? >> > >> > >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener From bsmith at mcs.anl.gov Mon Dec 3 13:21:24 2012 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 3 Dec 2012 13:21:24 -0600 Subject: [petsc-users] Solving A*X = B where A and B are matrices In-Reply-To: References: <55DD94DF-150F-4917-AA26-C0680107E065@mcs.anl.gov> Message-ID: <8C7CCD12-F869-4FE8-9DEA-0BBA1283DAEC@mcs.anl.gov> http://www.mcs.anl.gov/petsc/documentation/faq.html#computers On Dec 3, 2012, at 1:08 PM, Jelena Slivka wrote: > Thank you very much! > However, I have another question. I have a cluster of 4 nodes and each node has 6 cores. If I run my code using 6 cores on one node (using the command "mpiexec -n 6") it is much faster than running it on just one process (which is expected). However, if I try running the code on multiple nodes (using "mpiexec -f machinefile -ppn 4", where machinefile is the file which contains the node names), it runs much slower than on just one process. This also happens with tutorial examples. I have checked the number of iteration for KSP solver when spread on multiple processors and it doesn't seem to be the problem. Do you have any suggestions on what am I doing wrong? Are the commands I am using wrong? > > > On Sat, Dec 1, 2012 at 6:03 PM, Barry Smith wrote: > > We recommend following the directions http://www.mcs.anl.gov/petsc/documentation/faq.html#schurcomplement for computing a Schur complement; just skip the unneeded step. MUMPS supports a parallel Cholesky but you can also use a parallel LU with MUMPS, PaSTIX or SuperLU_Dist and those will work fine also. With current software Cholesky in parallel is not tons better than LU so generally not worth monkeying with. > > Barry > > > On Dec 1, 2012, at 12:05 PM, Jelena Slivka wrote: > > > Hello! > > I am trying to solve A*X = B where A and B are matrices, and then find trace of the resulting matrix X. My approach has been to partition matrix B in column vectors bi and then solve each system A*xi = bi. Then, for all vectors xi I would extract i-th element xi(i) and sum those elements in order to get Trace(X). > > Pseudo-code: > > 1) load matrices A and B > > 2) transpose matrix B (so that each right-hand side bi is in the row, as operation MatGetColumnVector is slow) > > 3) set up KSPSolve > > 4) create vector diagonal (in which xi(i) elements will be stored) > > 5) for each row i of matrix B owned by current process: > > - create vector bi by extracting row i from matrix B > > - apply KSPsolve to get xi > > - insert value xi(i) in diagonal vector (only the process which > > holds the ith value of vector x(i) should do so) > > 6) sum vector diagonal to get the trace. > > However, my code (attached, along with the test case) runs fine on one process, but hangs if started on multiple processes. Could you please help me figure out what am I doing wrong? > > Also, could you please tell me is it possible to use Cholesky factorization when running on multiple processes (I see that I cannot use it when I set the format of matrix A to MPIAIJ)? > > > > > > From fd.kong at siat.ac.cn Mon Dec 3 13:23:39 2012 From: fd.kong at siat.ac.cn (Fande Kong) Date: Mon, 3 Dec 2012 12:23:39 -0700 Subject: [petsc-users] Can anyone guess the possible reason of the following errors? In-Reply-To: <3416FED3-493A-42E6-83BB-EB661E69A90B@mcs.anl.gov> References: <3416FED3-493A-42E6-83BB-EB661E69A90B@mcs.anl.gov> Message-ID: Are there any constraints for IS and Vec? On Mon, Dec 3, 2012 at 12:19 PM, Barry Smith wrote: > > Perhaps some bad data is being passed into VecScatterCreate(). I would > suggest having SpmcsSFCreateVecScatter > validate the IS's and Vecs being passed in. For example, do the IS have > tons of duplicates, how long are they etc? > > Barry > > On Dec 3, 2012, at 1:12 PM, Fande Kong wrote: > > > More details for the errors: > > > > [0]PETSC ERROR: PetscGatherMessageLengths() line 133 in > src/sys/utils/mpimesg.c > > [0]PETSC ERROR: VecScatterCreate_PtoP() line 2188 in > src/vec/vec/utils/vpscat.c > > [0]PETSC ERROR: VecScatterCreate() line 1431 in src/vec/vec/utils/vscat.c > > [0]PETSC ERROR: SpmcsSFCreateVecScatter() line 96 in SpmcsSFComm.cpp > > [0]PETSC ERROR: moveDataBetweenRootsAndLeaves() line 133 in > SpmcsSFComm.cpp > > [0]PETSC ERROR: SpmcsSFCreateNormalizedEmbeddedSF() line 359 in > SpmcsSFComm.cpp > > [0]PETSC ERROR: SpmcsSFDistributeSection() line 343 in SpmcsSection.cpp > > [0]PETSC ERROR: SpmcsMeshDistribute() line 444 in distributeMesh.cpp > > [0]PETSC ERROR: DMmeshInitialize() line 32 in mgInitialize.cpp > > [0]PETSC ERROR: main() line 64 in linearElasticity3d.cpp > > application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0 > > > > > ===================================================================================== > > = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES > > = EXIT CODE: 256 > > = CLEANING UP REMAINING PROCESSES > > = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES > > > ===================================================================================== > > [proxy:0:1 at node1778] HYD_pmcd_pmip_control_cmd_cb > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > > [proxy:0:1 at node1778] HYDT_dmxu_poll_wait_for_event > (./tools/demux/demux_poll.c:77): callback returned error status > > [proxy:0:1 at node1778] main (./pm/pmiserv/pmip.c:214): demux engine error > waiting for event > > [proxy:0:2 at node1777] HYD_pmcd_pmip_control_cmd_cb > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > > [proxy:0:2 at node1777] HYDT_dmxu_poll_wait_for_event > (./tools/demux/demux_poll.c:77): callback returned error status > > [proxy:0:2 at node1777] main (./pm/pmiserv/pmip.c:214): demux engine error > waiting for event > > [proxy:0:3 at node1773] HYD_pmcd_pmip_control_cmd_cb > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > > [proxy:0:3 at node1773] HYDT_dmxu_poll_wait_for_event > (./tools/demux/demux_poll.c:77): callback returned error status > > [proxy:0:3 at node1773] main (./pm/pmiserv/pmip.c:214): demux engine error > waiting for event > > [proxy:0:4 at node1770] HYD_pmcd_pmip_control_cmd_cb > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > > [proxy:0:4 at node1770] HYDT_dmxu_poll_wait_for_event > (./tools/demux/demux_poll.c:77): callback returned error status > > [proxy:0:4 at node1770] main (./pm/pmiserv/pmip.c:214): demux engine error > waiting for event > > [proxy:0:6 at node1760] HYD_pmcd_pmip_control_cmd_cb > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > > [proxy:0:6 at node1760] HYDT_dmxu_poll_wait_for_event > (./tools/demux/demux_poll.c:77): callback returned error status > > [proxy:0:6 at node1760] main (./pm/pmiserv/pmip.c:214): demux engine error > waiting for event > > [proxy:0:7 at node1758] HYD_pmcd_pmip_control_cmd_cb > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > > [proxy:0:7 at node1758] HYDT_dmxu_poll_wait_for_event > (./tools/demux/demux_poll.c:77): callback returned error status > > [proxy:0:7 at node1758] main (./pm/pmiserv/pmip.c:214): demux engine error > waiting for event > > [proxy:0:8 at node1738] HYD_pmcd_pmip_control_cmd_cb > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > > [proxy:0:8 at node1738] HYDT_dmxu_poll_wait_for_event > (./tools/demux/demux_poll.c:77): callback returned error status > > [proxy:0:8 at node1738] main (./pm/pmiserv/pmip.c:214): demux engine error > waiting for event > > [proxy:0:9 at node1736] HYD_pmcd_pmip_control_cmd_cb > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > > [proxy:0:9 at node1736] HYDT_dmxu_poll_wait_for_event > (./tools/demux/demux_poll.c:77): callback returned error status > > [proxy:0:9 at node1736] main (./pm/pmiserv/pmip.c:214): demux engine error > waiting for event > > [proxy:0:10 at node1668] HYD_pmcd_pmip_control_cmd_cb > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > > [proxy:0:10 at node1668] HYDT_dmxu_poll_wait_for_event > (./tools/demux/demux_poll.c:77): callback returned error status > > [proxy:0:10 at node1668] main (./pm/pmiserv/pmip.c:214): demux engine > error waiting for event > > [proxy:0:11 at node1667] HYD_pmcd_pmip_control_cmd_cb > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > > [proxy:0:11 at node1667] HYDT_dmxu_poll_wait_for_event > (./tools/demux/demux_poll.c:77): callback returned error status > > [proxy:0:11 at node1667] main (./pm/pmiserv/pmip.c:214): demux engine > error waiting for event > > [proxy:0:12 at node1658] HYD_pmcd_pmip_control_cmd_cb > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > > [proxy:0:12 at node1658] HYDT_dmxu_poll_wait_for_event > (./tools/demux/demux_poll.c:77): callback returned error status > > [proxy:0:12 at node1658] main (./pm/pmiserv/pmip.c:214): demux engine > error waiting for event > > [proxy:0:13 at node1656] HYD_pmcd_pmip_control_cmd_cb > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > > [proxy:0:13 at node1656] HYDT_dmxu_poll_wait_for_event > (./tools/demux/demux_poll.c:77): callback returned error status > > [proxy:0:13 at node1656] main (./pm/pmiserv/pmip.c:214): demux engine > error waiting for event > > [proxy:0:14 at node1637] HYD_pmcd_pmip_control_cmd_cb > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > > [proxy:0:14 at node1637] HYDT_dmxu_poll_wait_for_event > (./tools/demux/demux_poll.c:77): callback returned error status > > [proxy:0:14 at node1637] main (./pm/pmiserv/pmip.c:214): demux engine > error waiting for event > > [proxy:0:15 at node1636] HYD_pmcd_pmip_control_cmd_cb > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > > [proxy:0:15 at node1636] HYDT_dmxu_poll_wait_for_event > (./tools/demux/demux_poll.c:77): callback returned error status > > [proxy:0:15 at node1636] main (./pm/pmiserv/pmip.c:214): demux engine > error waiting for event > > [proxy:0:16 at node1611] HYD_pmcd_pmip_control_cmd_cb > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > > [proxy:0:16 at node1611] HYDT_dmxu_poll_wait_for_event > (./tools/demux/demux_poll.c:77): callback returned error status > > [proxy:0:16 at node1611] main (./pm/pmiserv/pmip.c:214): demux engine > error waiting for event > > [proxy:0:17 at node1380] HYD_pmcd_pmip_control_cmd_cb > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > > [proxy:0:17 at node1380] HYDT_dmxu_poll_wait_for_event > (./tools/demux/demux_poll.c:77): callback returned error status > > [proxy:0:17 at node1380] main (./pm/pmiserv/pmip.c:214): demux engine > error waiting for event > > [proxy:0:18 at node1379] HYD_pmcd_pmip_control_cmd_cb > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > > [proxy:0:18 at node1379] HYDT_dmxu_poll_wait_for_event > (./tools/demux/demux_poll.c:77): callback returned error status > > [proxy:0:18 at node1379] main (./pm/pmiserv/pmip.c:214): demux engine > error waiting for event > > [proxy:0:19 at node1378] HYD_pmcd_pmip_control_cmd_cb > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > > [proxy:0:19 at node1378] HYDT_dmxu_poll_wait_for_event > (./tools/demux/demux_poll.c:77): callback returned error status > > [proxy:0:19 at node1378] main (./pm/pmiserv/pmip.c:214): demux engine > error waiting for event > > [proxy:0:20 at node1377] HYD_pmcd_pmip_control_cmd_cb > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > > [proxy:0:20 at node1377] HYDT_dmxu_poll_wait_for_event > (./tools/demux/demux_poll.c:77): callback returned error status > > [proxy:0:20 at node1377] main (./pm/pmiserv/pmip.c:214): demux engine > error waiting for event > > [proxy:0:21 at node1376] HYD_pmcd_pmip_control_cmd_cb > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > > [proxy:0:21 at node1376] HYDT_dmxu_poll_wait_for_event > (./tools/demux/demux_poll.c:77): callback returned error status > > [proxy:0:21 at node1376] main (./pm/pmiserv/pmip.c:214): demux engine > error waiting for event > > [proxy:0:22 at node1375] HYD_pmcd_pmip_control_cmd_cb > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > > [proxy:0:22 at node1375] HYDT_dmxu_poll_wait_for_event > (./tools/demux/demux_poll.c:77): callback returned error status > > [proxy:0:22 at node1375] main (./pm/pmiserv/pmip.c:214): demux engine > error waiting for event > > [proxy:0:23 at node1374] HYD_pmcd_pmip_control_cmd_cb > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > > [proxy:0:23 at node1374] HYDT_dmxu_poll_wait_for_event > (./tools/demux/demux_poll.c:77): callback returned error status > > [proxy:0:23 at node1374] main (./pm/pmiserv/pmip.c:214): demux engine > error waiting for event > > [proxy:0:24 at node1373] HYD_pmcd_pmip_control_cmd_cb > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > > [proxy:0:24 at node1373] HYDT_dmxu_poll_wait_for_event > (./tools/demux/demux_poll.c:77): callback returned error status > > [proxy:0:24 at node1373] main (./pm/pmiserv/pmip.c:214): demux engine > error waiting for event > > [proxy:0:25 at node1372] HYD_pmcd_pmip_control_cmd_cb > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > > [proxy:0:25 at node1372] HYDT_dmxu_poll_wait_for_event > (./tools/demux/demux_poll.c:77): callback returned error status > > [proxy:0:25 at node1372] main (./pm/pmiserv/pmip.c:214): demux engine > error waiting for event > > [proxy:0:26 at node1371] HYD_pmcd_pmip_control_cmd_cb > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > > [proxy:0:26 at node1371] HYDT_dmxu_poll_wait_for_event > (./tools/demux/demux_poll.c:77): callback returned error status > > [proxy:0:26 at node1371] main (./pm/pmiserv/pmip.c:214): demux engine > error waiting for event > > [proxy:0:27 at node1370] HYD_pmcd_pmip_control_cmd_cb > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > > [proxy:0:27 at node1370] HYDT_dmxu_poll_wait_for_event > (./tools/demux/demux_poll.c:77): callback returned error status > > [proxy:0:27 at node1370] main (./pm/pmiserv/pmip.c:214): demux engine > error waiting for event > > [proxy:0:28 at node1369] HYD_pmcd_pmip_control_cmd_cb > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > > [proxy:0:28 at node1369] HYDT_dmxu_poll_wait_for_event > (./tools/demux/demux_poll.c:77): callback returned error status > > [proxy:0:28 at node1369] main (./pm/pmiserv/pmip.c:214): demux engine > error waiting for event > > [proxy:0:29 at node1368] HYD_pmcd_pmip_control_cmd_cb > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > > [proxy:0:29 at node1368] HYDT_dmxu_poll_wait_for_event > (./tools/demux/demux_poll.c:77): callback returned error status > > [proxy:0:29 at node1368] main (./pm/pmiserv/pmip.c:214): demux engine > error waiting for event > > [proxy:0:30 at node1367] HYD_pmcd_pmip_control_cmd_cb > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > > [proxy:0:30 at node1367] HYDT_dmxu_poll_wait_for_event > (./tools/demux/demux_poll.c:77): callback returned error status > > [proxy:0:30 at node1367] main (./pm/pmiserv/pmip.c:214): demux engine > error waiting for event > > [proxy:0:31 at node1366] HYD_pmcd_pmip_control_cmd_cb > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > > [proxy:0:31 at node1366] HYDT_dmxu_poll_wait_for_event > (./tools/demux/demux_poll.c:77): callback returned error status > > [proxy:0:31 at node1366] main (./pm/pmiserv/pmip.c:214): demux engine > error waiting for event > > [mpiexec at node1780] HYDT_bscu_wait_for_completion > (./tools/bootstrap/utils/bscu_wait.c:70): one of the processes terminated > badly; aborting > > [mpiexec at node1780] HYDT_bsci_wait_for_completion > (./tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for > completion > > [mpiexec at node1780] HYD_pmci_wait_for_completion > (./pm/pmiserv/pmiserv_pmci.c:199): launcher returned error waiting for > completion > > [mpiexec at node1780] main (./ui/mpich/mpiexec.c:385): process manager > error waiting for completion > > > > It seems nothing. > > > > On Mon, Dec 3, 2012 at 11:41 AM, Matthew Knepley > wrote: > > On Mon, Dec 3, 2012 at 12:38 PM, Fande Kong > wrote: > > > Hi all, > > > > > > Can anyone guess the possible reason of the following errors: > > > > > > > > > [0]PETSC ERROR: PetscGatherMessageLengths() line 133 in > > > src/sys/utils/mpimesg.c > > > [0]PETSC ERROR: VecScatterCreate_PtoP() line 2188 in > > > src/vec/vec/utils/vpscat.c > > > [0]PETSC ERROR: VecScatterCreate() line 1431 in > src/vec/vec/utils/vscat.c > > > > Partial error messages are generally not helpful. > > > > Matt > > > > > I have been working for several days to figure out the reason, but now > I > > > still get nothing. I use Petsc-3.3-p3 based on the mvapich2-1.6. I > tried to > > > use vecscatter to distribute the mesh. When the mesh was small, > everything > > > was ok. But when the mesh became larger about 14,000,000 elements, I > got the > > > above errors. > > > > > > -- > > > Fande Kong > > > Department of Computer Science > > > University of Colorado at Boulder > > > > > > > > > > > > > > -- > > What most experimenters take for granted before they begin their > > experiments is infinitely more interesting than any results to which > > their experiments lead. > > -- Norbert Wiener > > > > > > > > > > -- > > Fande Kong > > ShenZhen Institutes of Advanced Technology > > Chinese Academy of Sciences > > > > > -- Fande Kong ShenZhen Institutes of Advanced Technology Chinese Academy of Sciences -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Dec 3 13:25:01 2012 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 3 Dec 2012 13:25:01 -0600 Subject: [petsc-users] Can anyone guess the possible reason of the following errors? In-Reply-To: References: <3416FED3-493A-42E6-83BB-EB661E69A90B@mcs.anl.gov> Message-ID: On Mon, Dec 3, 2012 at 1:23 PM, Fande Kong wrote: > Are there any constraints for IS and Vec? No, but this appears to be inconsistency. Matt > On Mon, Dec 3, 2012 at 12:19 PM, Barry Smith wrote: >> >> >> Perhaps some bad data is being passed into VecScatterCreate(). I would >> suggest having SpmcsSFCreateVecScatter >> validate the IS's and Vecs being passed in. For example, do the IS have >> tons of duplicates, how long are they etc? >> >> Barry >> >> On Dec 3, 2012, at 1:12 PM, Fande Kong wrote: >> >> > More details for the errors: >> > >> > [0]PETSC ERROR: PetscGatherMessageLengths() line 133 in >> > src/sys/utils/mpimesg.c >> > [0]PETSC ERROR: VecScatterCreate_PtoP() line 2188 in >> > src/vec/vec/utils/vpscat.c >> > [0]PETSC ERROR: VecScatterCreate() line 1431 in >> > src/vec/vec/utils/vscat.c >> > [0]PETSC ERROR: SpmcsSFCreateVecScatter() line 96 in SpmcsSFComm.cpp >> > [0]PETSC ERROR: moveDataBetweenRootsAndLeaves() line 133 in >> > SpmcsSFComm.cpp >> > [0]PETSC ERROR: SpmcsSFCreateNormalizedEmbeddedSF() line 359 in >> > SpmcsSFComm.cpp >> > [0]PETSC ERROR: SpmcsSFDistributeSection() line 343 in SpmcsSection.cpp >> > [0]PETSC ERROR: SpmcsMeshDistribute() line 444 in distributeMesh.cpp >> > [0]PETSC ERROR: DMmeshInitialize() line 32 in mgInitialize.cpp >> > [0]PETSC ERROR: main() line 64 in linearElasticity3d.cpp >> > application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0 >> > >> > >> > ===================================================================================== >> > = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES >> > = EXIT CODE: 256 >> > = CLEANING UP REMAINING PROCESSES >> > = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES >> > >> > ===================================================================================== >> > [proxy:0:1 at node1778] HYD_pmcd_pmip_control_cmd_cb >> > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed >> > [proxy:0:1 at node1778] HYDT_dmxu_poll_wait_for_event >> > (./tools/demux/demux_poll.c:77): callback returned error status >> > [proxy:0:1 at node1778] main (./pm/pmiserv/pmip.c:214): demux engine error >> > waiting for event >> > [proxy:0:2 at node1777] HYD_pmcd_pmip_control_cmd_cb >> > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed >> > [proxy:0:2 at node1777] HYDT_dmxu_poll_wait_for_event >> > (./tools/demux/demux_poll.c:77): callback returned error status >> > [proxy:0:2 at node1777] main (./pm/pmiserv/pmip.c:214): demux engine error >> > waiting for event >> > [proxy:0:3 at node1773] HYD_pmcd_pmip_control_cmd_cb >> > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed >> > [proxy:0:3 at node1773] HYDT_dmxu_poll_wait_for_event >> > (./tools/demux/demux_poll.c:77): callback returned error status >> > [proxy:0:3 at node1773] main (./pm/pmiserv/pmip.c:214): demux engine error >> > waiting for event >> > [proxy:0:4 at node1770] HYD_pmcd_pmip_control_cmd_cb >> > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed >> > [proxy:0:4 at node1770] HYDT_dmxu_poll_wait_for_event >> > (./tools/demux/demux_poll.c:77): callback returned error status >> > [proxy:0:4 at node1770] main (./pm/pmiserv/pmip.c:214): demux engine error >> > waiting for event >> > [proxy:0:6 at node1760] HYD_pmcd_pmip_control_cmd_cb >> > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed >> > [proxy:0:6 at node1760] HYDT_dmxu_poll_wait_for_event >> > (./tools/demux/demux_poll.c:77): callback returned error status >> > [proxy:0:6 at node1760] main (./pm/pmiserv/pmip.c:214): demux engine error >> > waiting for event >> > [proxy:0:7 at node1758] HYD_pmcd_pmip_control_cmd_cb >> > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed >> > [proxy:0:7 at node1758] HYDT_dmxu_poll_wait_for_event >> > (./tools/demux/demux_poll.c:77): callback returned error status >> > [proxy:0:7 at node1758] main (./pm/pmiserv/pmip.c:214): demux engine error >> > waiting for event >> > [proxy:0:8 at node1738] HYD_pmcd_pmip_control_cmd_cb >> > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed >> > [proxy:0:8 at node1738] HYDT_dmxu_poll_wait_for_event >> > (./tools/demux/demux_poll.c:77): callback returned error status >> > [proxy:0:8 at node1738] main (./pm/pmiserv/pmip.c:214): demux engine error >> > waiting for event >> > [proxy:0:9 at node1736] HYD_pmcd_pmip_control_cmd_cb >> > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed >> > [proxy:0:9 at node1736] HYDT_dmxu_poll_wait_for_event >> > (./tools/demux/demux_poll.c:77): callback returned error status >> > [proxy:0:9 at node1736] main (./pm/pmiserv/pmip.c:214): demux engine error >> > waiting for event >> > [proxy:0:10 at node1668] HYD_pmcd_pmip_control_cmd_cb >> > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed >> > [proxy:0:10 at node1668] HYDT_dmxu_poll_wait_for_event >> > (./tools/demux/demux_poll.c:77): callback returned error status >> > [proxy:0:10 at node1668] main (./pm/pmiserv/pmip.c:214): demux engine error >> > waiting for event >> > [proxy:0:11 at node1667] HYD_pmcd_pmip_control_cmd_cb >> > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed >> > [proxy:0:11 at node1667] HYDT_dmxu_poll_wait_for_event >> > (./tools/demux/demux_poll.c:77): callback returned error status >> > [proxy:0:11 at node1667] main (./pm/pmiserv/pmip.c:214): demux engine error >> > waiting for event >> > [proxy:0:12 at node1658] HYD_pmcd_pmip_control_cmd_cb >> > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed >> > [proxy:0:12 at node1658] HYDT_dmxu_poll_wait_for_event >> > (./tools/demux/demux_poll.c:77): callback returned error status >> > [proxy:0:12 at node1658] main (./pm/pmiserv/pmip.c:214): demux engine error >> > waiting for event >> > [proxy:0:13 at node1656] HYD_pmcd_pmip_control_cmd_cb >> > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed >> > [proxy:0:13 at node1656] HYDT_dmxu_poll_wait_for_event >> > (./tools/demux/demux_poll.c:77): callback returned error status >> > [proxy:0:13 at node1656] main (./pm/pmiserv/pmip.c:214): demux engine error >> > waiting for event >> > [proxy:0:14 at node1637] HYD_pmcd_pmip_control_cmd_cb >> > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed >> > [proxy:0:14 at node1637] HYDT_dmxu_poll_wait_for_event >> > (./tools/demux/demux_poll.c:77): callback returned error status >> > [proxy:0:14 at node1637] main (./pm/pmiserv/pmip.c:214): demux engine error >> > waiting for event >> > [proxy:0:15 at node1636] HYD_pmcd_pmip_control_cmd_cb >> > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed >> > [proxy:0:15 at node1636] HYDT_dmxu_poll_wait_for_event >> > (./tools/demux/demux_poll.c:77): callback returned error status >> > [proxy:0:15 at node1636] main (./pm/pmiserv/pmip.c:214): demux engine error >> > waiting for event >> > [proxy:0:16 at node1611] HYD_pmcd_pmip_control_cmd_cb >> > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed >> > [proxy:0:16 at node1611] HYDT_dmxu_poll_wait_for_event >> > (./tools/demux/demux_poll.c:77): callback returned error status >> > [proxy:0:16 at node1611] main (./pm/pmiserv/pmip.c:214): demux engine error >> > waiting for event >> > [proxy:0:17 at node1380] HYD_pmcd_pmip_control_cmd_cb >> > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed >> > [proxy:0:17 at node1380] HYDT_dmxu_poll_wait_for_event >> > (./tools/demux/demux_poll.c:77): callback returned error status >> > [proxy:0:17 at node1380] main (./pm/pmiserv/pmip.c:214): demux engine error >> > waiting for event >> > [proxy:0:18 at node1379] HYD_pmcd_pmip_control_cmd_cb >> > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed >> > [proxy:0:18 at node1379] HYDT_dmxu_poll_wait_for_event >> > (./tools/demux/demux_poll.c:77): callback returned error status >> > [proxy:0:18 at node1379] main (./pm/pmiserv/pmip.c:214): demux engine error >> > waiting for event >> > [proxy:0:19 at node1378] HYD_pmcd_pmip_control_cmd_cb >> > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed >> > [proxy:0:19 at node1378] HYDT_dmxu_poll_wait_for_event >> > (./tools/demux/demux_poll.c:77): callback returned error status >> > [proxy:0:19 at node1378] main (./pm/pmiserv/pmip.c:214): demux engine error >> > waiting for event >> > [proxy:0:20 at node1377] HYD_pmcd_pmip_control_cmd_cb >> > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed >> > [proxy:0:20 at node1377] HYDT_dmxu_poll_wait_for_event >> > (./tools/demux/demux_poll.c:77): callback returned error status >> > [proxy:0:20 at node1377] main (./pm/pmiserv/pmip.c:214): demux engine error >> > waiting for event >> > [proxy:0:21 at node1376] HYD_pmcd_pmip_control_cmd_cb >> > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed >> > [proxy:0:21 at node1376] HYDT_dmxu_poll_wait_for_event >> > (./tools/demux/demux_poll.c:77): callback returned error status >> > [proxy:0:21 at node1376] main (./pm/pmiserv/pmip.c:214): demux engine error >> > waiting for event >> > [proxy:0:22 at node1375] HYD_pmcd_pmip_control_cmd_cb >> > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed >> > [proxy:0:22 at node1375] HYDT_dmxu_poll_wait_for_event >> > (./tools/demux/demux_poll.c:77): callback returned error status >> > [proxy:0:22 at node1375] main (./pm/pmiserv/pmip.c:214): demux engine error >> > waiting for event >> > [proxy:0:23 at node1374] HYD_pmcd_pmip_control_cmd_cb >> > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed >> > [proxy:0:23 at node1374] HYDT_dmxu_poll_wait_for_event >> > (./tools/demux/demux_poll.c:77): callback returned error status >> > [proxy:0:23 at node1374] main (./pm/pmiserv/pmip.c:214): demux engine error >> > waiting for event >> > [proxy:0:24 at node1373] HYD_pmcd_pmip_control_cmd_cb >> > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed >> > [proxy:0:24 at node1373] HYDT_dmxu_poll_wait_for_event >> > (./tools/demux/demux_poll.c:77): callback returned error status >> > [proxy:0:24 at node1373] main (./pm/pmiserv/pmip.c:214): demux engine error >> > waiting for event >> > [proxy:0:25 at node1372] HYD_pmcd_pmip_control_cmd_cb >> > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed >> > [proxy:0:25 at node1372] HYDT_dmxu_poll_wait_for_event >> > (./tools/demux/demux_poll.c:77): callback returned error status >> > [proxy:0:25 at node1372] main (./pm/pmiserv/pmip.c:214): demux engine error >> > waiting for event >> > [proxy:0:26 at node1371] HYD_pmcd_pmip_control_cmd_cb >> > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed >> > [proxy:0:26 at node1371] HYDT_dmxu_poll_wait_for_event >> > (./tools/demux/demux_poll.c:77): callback returned error status >> > [proxy:0:26 at node1371] main (./pm/pmiserv/pmip.c:214): demux engine error >> > waiting for event >> > [proxy:0:27 at node1370] HYD_pmcd_pmip_control_cmd_cb >> > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed >> > [proxy:0:27 at node1370] HYDT_dmxu_poll_wait_for_event >> > (./tools/demux/demux_poll.c:77): callback returned error status >> > [proxy:0:27 at node1370] main (./pm/pmiserv/pmip.c:214): demux engine error >> > waiting for event >> > [proxy:0:28 at node1369] HYD_pmcd_pmip_control_cmd_cb >> > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed >> > [proxy:0:28 at node1369] HYDT_dmxu_poll_wait_for_event >> > (./tools/demux/demux_poll.c:77): callback returned error status >> > [proxy:0:28 at node1369] main (./pm/pmiserv/pmip.c:214): demux engine error >> > waiting for event >> > [proxy:0:29 at node1368] HYD_pmcd_pmip_control_cmd_cb >> > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed >> > [proxy:0:29 at node1368] HYDT_dmxu_poll_wait_for_event >> > (./tools/demux/demux_poll.c:77): callback returned error status >> > [proxy:0:29 at node1368] main (./pm/pmiserv/pmip.c:214): demux engine error >> > waiting for event >> > [proxy:0:30 at node1367] HYD_pmcd_pmip_control_cmd_cb >> > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed >> > [proxy:0:30 at node1367] HYDT_dmxu_poll_wait_for_event >> > (./tools/demux/demux_poll.c:77): callback returned error status >> > [proxy:0:30 at node1367] main (./pm/pmiserv/pmip.c:214): demux engine error >> > waiting for event >> > [proxy:0:31 at node1366] HYD_pmcd_pmip_control_cmd_cb >> > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed >> > [proxy:0:31 at node1366] HYDT_dmxu_poll_wait_for_event >> > (./tools/demux/demux_poll.c:77): callback returned error status >> > [proxy:0:31 at node1366] main (./pm/pmiserv/pmip.c:214): demux engine error >> > waiting for event >> > [mpiexec at node1780] HYDT_bscu_wait_for_completion >> > (./tools/bootstrap/utils/bscu_wait.c:70): one of the processes terminated >> > badly; aborting >> > [mpiexec at node1780] HYDT_bsci_wait_for_completion >> > (./tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for >> > completion >> > [mpiexec at node1780] HYD_pmci_wait_for_completion >> > (./pm/pmiserv/pmiserv_pmci.c:199): launcher returned error waiting for >> > completion >> > [mpiexec at node1780] main (./ui/mpich/mpiexec.c:385): process manager >> > error waiting for completion >> > >> > It seems nothing. >> > >> > On Mon, Dec 3, 2012 at 11:41 AM, Matthew Knepley >> > wrote: >> > On Mon, Dec 3, 2012 at 12:38 PM, Fande Kong >> > wrote: >> > > Hi all, >> > > >> > > Can anyone guess the possible reason of the following errors: >> > > >> > > >> > > [0]PETSC ERROR: PetscGatherMessageLengths() line 133 in >> > > src/sys/utils/mpimesg.c >> > > [0]PETSC ERROR: VecScatterCreate_PtoP() line 2188 in >> > > src/vec/vec/utils/vpscat.c >> > > [0]PETSC ERROR: VecScatterCreate() line 1431 in >> > > src/vec/vec/utils/vscat.c >> > >> > Partial error messages are generally not helpful. >> > >> > Matt >> > >> > > I have been working for several days to figure out the reason, but now >> > > I >> > > still get nothing. I use Petsc-3.3-p3 based on the mvapich2-1.6. I >> > > tried to >> > > use vecscatter to distribute the mesh. When the mesh was small, >> > > everything >> > > was ok. But when the mesh became larger about 14,000,000 elements, I >> > > got the >> > > above errors. >> > > >> > > -- >> > > Fande Kong >> > > Department of Computer Science >> > > University of Colorado at Boulder >> > > >> > > >> > >> > >> > >> > -- >> > What most experimenters take for granted before they begin their >> > experiments is infinitely more interesting than any results to which >> > their experiments lead. >> > -- Norbert Wiener >> > >> > >> > >> > >> > -- >> > Fande Kong >> > ShenZhen Institutes of Advanced Technology >> > Chinese Academy of Sciences >> > >> >> > > > > -- > Fande Kong > ShenZhen Institutes of Advanced Technology > Chinese Academy of Sciences > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener From bsmith at mcs.anl.gov Mon Dec 3 13:27:54 2012 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 3 Dec 2012 13:27:54 -0600 Subject: [petsc-users] Can anyone guess the possible reason of the following errors? In-Reply-To: References: <3416FED3-493A-42E6-83BB-EB661E69A90B@mcs.anl.gov> Message-ID: <95B914B5-3DCB-4BD1-BB87-3ED00221847B@mcs.anl.gov> On Dec 3, 2012, at 1:23 PM, Fande Kong wrote: > Are there any constraints for IS and Vec? You could also run with the option -mpi_return_on_error false and MPI may print an error message of what it thinks has gone wrong. Barry > > On Mon, Dec 3, 2012 at 12:19 PM, Barry Smith wrote: > > Perhaps some bad data is being passed into VecScatterCreate(). I would suggest having SpmcsSFCreateVecScatter > validate the IS's and Vecs being passed in. For example, do the IS have tons of duplicates, how long are they etc? > > Barry > > On Dec 3, 2012, at 1:12 PM, Fande Kong wrote: > > > More details for the errors: > > > > [0]PETSC ERROR: PetscGatherMessageLengths() line 133 in src/sys/utils/mpimesg.c > > [0]PETSC ERROR: VecScatterCreate_PtoP() line 2188 in src/vec/vec/utils/vpscat.c > > [0]PETSC ERROR: VecScatterCreate() line 1431 in src/vec/vec/utils/vscat.c > > [0]PETSC ERROR: SpmcsSFCreateVecScatter() line 96 in SpmcsSFComm.cpp > > [0]PETSC ERROR: moveDataBetweenRootsAndLeaves() line 133 in SpmcsSFComm.cpp > > [0]PETSC ERROR: SpmcsSFCreateNormalizedEmbeddedSF() line 359 in SpmcsSFComm.cpp > > [0]PETSC ERROR: SpmcsSFDistributeSection() line 343 in SpmcsSection.cpp > > [0]PETSC ERROR: SpmcsMeshDistribute() line 444 in distributeMesh.cpp > > [0]PETSC ERROR: DMmeshInitialize() line 32 in mgInitialize.cpp > > [0]PETSC ERROR: main() line 64 in linearElasticity3d.cpp > > application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0 > > > > ===================================================================================== > > = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES > > = EXIT CODE: 256 > > = CLEANING UP REMAINING PROCESSES > > = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES > > ===================================================================================== > > [proxy:0:1 at node1778] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > > [proxy:0:1 at node1778] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status > > [proxy:0:1 at node1778] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event > > [proxy:0:2 at node1777] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > > [proxy:0:2 at node1777] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status > > [proxy:0:2 at node1777] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event > > [proxy:0:3 at node1773] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > > [proxy:0:3 at node1773] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status > > [proxy:0:3 at node1773] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event > > [proxy:0:4 at node1770] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > > [proxy:0:4 at node1770] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status > > [proxy:0:4 at node1770] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event > > [proxy:0:6 at node1760] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > > [proxy:0:6 at node1760] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status > > [proxy:0:6 at node1760] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event > > [proxy:0:7 at node1758] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > > [proxy:0:7 at node1758] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status > > [proxy:0:7 at node1758] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event > > [proxy:0:8 at node1738] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > > [proxy:0:8 at node1738] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status > > [proxy:0:8 at node1738] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event > > [proxy:0:9 at node1736] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > > [proxy:0:9 at node1736] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status > > [proxy:0:9 at node1736] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event > > [proxy:0:10 at node1668] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > > [proxy:0:10 at node1668] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status > > [proxy:0:10 at node1668] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event > > [proxy:0:11 at node1667] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > > [proxy:0:11 at node1667] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status > > [proxy:0:11 at node1667] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event > > [proxy:0:12 at node1658] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > > [proxy:0:12 at node1658] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status > > [proxy:0:12 at node1658] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event > > [proxy:0:13 at node1656] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > > [proxy:0:13 at node1656] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status > > [proxy:0:13 at node1656] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event > > [proxy:0:14 at node1637] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > > [proxy:0:14 at node1637] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status > > [proxy:0:14 at node1637] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event > > [proxy:0:15 at node1636] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > > [proxy:0:15 at node1636] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status > > [proxy:0:15 at node1636] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event > > [proxy:0:16 at node1611] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > > [proxy:0:16 at node1611] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status > > [proxy:0:16 at node1611] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event > > [proxy:0:17 at node1380] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > > [proxy:0:17 at node1380] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status > > [proxy:0:17 at node1380] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event > > [proxy:0:18 at node1379] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > > [proxy:0:18 at node1379] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status > > [proxy:0:18 at node1379] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event > > [proxy:0:19 at node1378] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > > [proxy:0:19 at node1378] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status > > [proxy:0:19 at node1378] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event > > [proxy:0:20 at node1377] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > > [proxy:0:20 at node1377] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status > > [proxy:0:20 at node1377] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event > > [proxy:0:21 at node1376] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > > [proxy:0:21 at node1376] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status > > [proxy:0:21 at node1376] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event > > [proxy:0:22 at node1375] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > > [proxy:0:22 at node1375] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status > > [proxy:0:22 at node1375] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event > > [proxy:0:23 at node1374] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > > [proxy:0:23 at node1374] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status > > [proxy:0:23 at node1374] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event > > [proxy:0:24 at node1373] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > > [proxy:0:24 at node1373] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status > > [proxy:0:24 at node1373] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event > > [proxy:0:25 at node1372] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > > [proxy:0:25 at node1372] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status > > [proxy:0:25 at node1372] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event > > [proxy:0:26 at node1371] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > > [proxy:0:26 at node1371] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status > > [proxy:0:26 at node1371] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event > > [proxy:0:27 at node1370] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > > [proxy:0:27 at node1370] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status > > [proxy:0:27 at node1370] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event > > [proxy:0:28 at node1369] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > > [proxy:0:28 at node1369] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status > > [proxy:0:28 at node1369] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event > > [proxy:0:29 at node1368] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > > [proxy:0:29 at node1368] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status > > [proxy:0:29 at node1368] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event > > [proxy:0:30 at node1367] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > > [proxy:0:30 at node1367] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status > > [proxy:0:30 at node1367] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event > > [proxy:0:31 at node1366] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed > > [proxy:0:31 at node1366] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status > > [proxy:0:31 at node1366] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event > > [mpiexec at node1780] HYDT_bscu_wait_for_completion (./tools/bootstrap/utils/bscu_wait.c:70): one of the processes terminated badly; aborting > > [mpiexec at node1780] HYDT_bsci_wait_for_completion (./tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for completion > > [mpiexec at node1780] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:199): launcher returned error waiting for completion > > [mpiexec at node1780] main (./ui/mpich/mpiexec.c:385): process manager error waiting for completion > > > > It seems nothing. > > > > On Mon, Dec 3, 2012 at 11:41 AM, Matthew Knepley wrote: > > On Mon, Dec 3, 2012 at 12:38 PM, Fande Kong wrote: > > > Hi all, > > > > > > Can anyone guess the possible reason of the following errors: > > > > > > > > > [0]PETSC ERROR: PetscGatherMessageLengths() line 133 in > > > src/sys/utils/mpimesg.c > > > [0]PETSC ERROR: VecScatterCreate_PtoP() line 2188 in > > > src/vec/vec/utils/vpscat.c > > > [0]PETSC ERROR: VecScatterCreate() line 1431 in src/vec/vec/utils/vscat.c > > > > Partial error messages are generally not helpful. > > > > Matt > > > > > I have been working for several days to figure out the reason, but now I > > > still get nothing. I use Petsc-3.3-p3 based on the mvapich2-1.6. I tried to > > > use vecscatter to distribute the mesh. When the mesh was small, everything > > > was ok. But when the mesh became larger about 14,000,000 elements, I got the > > > above errors. > > > > > > -- > > > Fande Kong > > > Department of Computer Science > > > University of Colorado at Boulder > > > > > > > > > > > > > > -- > > What most experimenters take for granted before they begin their > > experiments is infinitely more interesting than any results to which > > their experiments lead. > > -- Norbert Wiener > > > > > > > > > > -- > > Fande Kong > > ShenZhen Institutes of Advanced Technology > > Chinese Academy of Sciences > > > > > > > > -- > Fande Kong > ShenZhen Institutes of Advanced Technology > Chinese Academy of Sciences > From agrayver at gfz-potsdam.de Tue Dec 4 05:07:33 2012 From: agrayver at gfz-potsdam.de (Alexander Grayver) Date: Tue, 04 Dec 2012 12:07:33 +0100 Subject: [petsc-users] valgrind complains about string functions In-Reply-To: References: <50BC9D0A.2040803@gfz-potsdam.de> Message-ID: <50BDD975.5010100@gfz-potsdam.de> Jed, Satish, suppression file is a nice option, thanks. On 03.12.2012 17:53, Satish Balay wrote: > On Mon, 3 Dec 2012, Alexander Grayver wrote: > >> Hello, >> >> I'm using PETSc-3.3-p4 compiled with ICC 12.0 + IntelMPI 4.0.3 and getting a >> bunch of the errors related to the string functions: >> >> ==22020== Conditional jump or move depends on uninitialised value(s) >> ==22020== at 0x4D3109: __intel_sse2_strcpy (in /home/main) >> ==22020== by 0xE87D51D: PetscStrcpy (in >> /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so) >> ==22020== by 0xE87B6A4: PetscStrallocpy (in >> /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so) >> ==22020== by 0xE796769: PetscFListGetPathAndFunction (in >> /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so) >> ==22020== by 0xE79652A: PetscFListAdd (in >> /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so) >> ==22020== by 0xE64ACB8: MatMFFDRegister (in >> /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so) >> ==22020== by 0xE64FA7D: MatMFFDRegisterAll (in >> /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so) >> ==22020== by 0xE64F65B: MatMFFDInitializePackage (in >> /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so) >> ==22020== by 0xE48D8C2: MatInitializePackage (in >> /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so) >> ==22020== by 0xE5157DB: MatCreate (in >> /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so) >> ==22020== by 0xE29A74C: MatCreateSeqAIJ (in >> /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so) >> >> Same thing for PetscStrncat etc. >> >> There was similar question two years ago in this mailing list and advice was >> to use a different compiler. It is not an option for me. > You can always use a separate build of PETSc with gcc,--download-mpich > to get a valgrind clean build [for debugging purposes] > >> Thus, my question is can those errors potentially cause any serious troubles? > Generally we can ignore issues valgrind finds in system/compiler > libraries. [Jed has a valid explanation for this one]. > > And generally valgrind provides 'default suppression files' for known > glibc versions. But for such issues as with ifc, you can ask valgrind > to create a supression file - and then rerun valgrind with this custom > supression file - to get more readable output. > > Satish > >> I came across with time trying to debug a weird segmentation fault. >> >> Thanks. >> >> -- Regards, Alexander From gokhalen at gmail.com Thu Dec 6 12:33:25 2012 From: gokhalen at gmail.com (Nachiket Gokhale) Date: Thu, 6 Dec 2012 13:33:25 -0500 Subject: [petsc-users] real and imaginary part of a number Message-ID: Does petsc provide functions to get real and imaginary parts of a number? I couldn't seem to find any functions in http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/index.html or in the vec collective either. Cheers, -Nachiket -------------- next part -------------- An HTML attachment was scrubbed... URL: From jedbrown at mcs.anl.gov Thu Dec 6 12:35:02 2012 From: jedbrown at mcs.anl.gov (Jed Brown) Date: Thu, 6 Dec 2012 10:35:02 -0800 Subject: [petsc-users] real and imaginary part of a number In-Reply-To: References: Message-ID: PetscRealPart() and PetscImaginaryPart() It looks like none of the math functions have man pages. On Thu, Dec 6, 2012 at 10:33 AM, Nachiket Gokhale wrote: > Does petsc provide functions to get real and imaginary parts of a number? > I couldn't seem to find any functions in > > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/index.html > > or in the vec collective either. > > Cheers, > > -Nachiket > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jedbrown at mcs.anl.gov Thu Dec 6 12:46:42 2012 From: jedbrown at mcs.anl.gov (Jed Brown) Date: Thu, 6 Dec 2012 10:46:42 -0800 Subject: [petsc-users] real and imaginary part of a number In-Reply-To: References: Message-ID: 1. *Always* reply to the list, not me personally. 2. No, but there is VecConjugate(). On Thu, Dec 6, 2012 at 10:39 AM, Nachiket Gokhale wrote: > Thanks! And are there corresponding functions for a vector? VecRealPart > and VecImagPart? > > -Nachiket > > On Thu, Dec 6, 2012 at 1:35 PM, Jed Brown wrote: > >> PetscRealPart() and PetscImaginaryPart() >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas.witkowski at tu-dresden.de Sat Dec 8 07:59:12 2012 From: thomas.witkowski at tu-dresden.de (Thomas Witkowski) Date: Sat, 08 Dec 2012 14:59:12 +0100 Subject: [petsc-users] Creating explicit matrix scatter Message-ID: <50C347B0.3020300@tu-dresden.de> A have a distributed MATAIJ, which is non square. I want to create a new matrix, which has the same col layout but a different row layout and should be scattered from the original matrix. Thus, each rank should collect some rows, which may be non local in the original matrix, to its own local part of the new matrix. After creating the new matrix, I need not only to make some MatMult, but I need local access to the matrix rows. How to do this? Thanks for any advise. Thomas From jedbrown at mcs.anl.gov Sat Dec 8 08:12:48 2012 From: jedbrown at mcs.anl.gov (Jed Brown) Date: Sat, 8 Dec 2012 06:12:48 -0800 Subject: [petsc-users] Creating explicit matrix scatter In-Reply-To: <50C347B0.3020300@tu-dresden.de> References: <50C347B0.3020300@tu-dresden.de> Message-ID: MatGetSubMatrix() and later, MatGetRow() On Sat, Dec 8, 2012 at 5:59 AM, Thomas Witkowski < thomas.witkowski at tu-dresden.de> wrote: > A have a distributed MATAIJ, which is non square. I want to create a new > matrix, which has the same col layout but a different row layout and should > be scattered from the original matrix. Thus, each rank should collect some > rows, which may be non local in the original matrix, to its own local part > of the new matrix. After creating the new matrix, I need not only to make > some MatMult, but I need local access to the matrix rows. How to do this? > Thanks for any advise. > > Thomas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas.witkowski at tu-dresden.de Sat Dec 8 08:33:57 2012 From: thomas.witkowski at tu-dresden.de (Thomas Witkowski) Date: Sat, 08 Dec 2012 15:33:57 +0100 Subject: [petsc-users] Creating explicit matrix scatter In-Reply-To: References: <50C347B0.3020300@tu-dresden.de> Message-ID: <50C34FD5.8050702@tu-dresden.de> I checked the documentation of MatGetSubMatrix() and found the following: "The rows in isrow will be sorted into the same order as the original matrix on each process." For my case, this will be wrong. I need to say each task exactly which row from the old matrix should be which row in the new matrix. Any other possibility to do this? Thomas Am 08.12.2012 15:12, schrieb Jed Brown: > MatGetSubMatrix() and later, MatGetRow() > > > On Sat, Dec 8, 2012 at 5:59 AM, Thomas Witkowski > > wrote: > > A have a distributed MATAIJ, which is non square. I want to create > a new matrix, which has the same col layout but a different row > layout and should be scattered from the original matrix. Thus, > each rank should collect some rows, which may be non local in the > original matrix, to its own local part of the new matrix. After > creating the new matrix, I need not only to make some MatMult, but > I need local access to the matrix rows. How to do this? Thanks for > any advise. > > Thomas > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jedbrown at mcs.anl.gov Sat Dec 8 08:37:26 2012 From: jedbrown at mcs.anl.gov (Jed Brown) Date: Sat, 8 Dec 2012 06:37:26 -0800 Subject: [petsc-users] Creating explicit matrix scatter In-Reply-To: <50C34FD5.8050702@tu-dresden.de> References: <50C347B0.3020300@tu-dresden.de> <50C34FD5.8050702@tu-dresden.de> Message-ID: How about MatPermute() in a suitable place? I don't know why you would need such a thing. On Sat, Dec 8, 2012 at 6:33 AM, Thomas Witkowski < thomas.witkowski at tu-dresden.de> wrote: > I checked the documentation of MatGetSubMatrix() and found the following: > > "The rows in isrow will be sorted into the same order as the original > matrix on each process." > > For my case, this will be wrong. I need to say each task exactly which row > from the old matrix should be which row in the new matrix. Any other > possibility to do this? > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas.witkowski at tu-dresden.de Sat Dec 8 09:11:16 2012 From: thomas.witkowski at tu-dresden.de (Thomas Witkowski) Date: Sat, 08 Dec 2012 16:11:16 +0100 Subject: [petsc-users] Creating explicit matrix scatter In-Reply-To: References: <50C347B0.3020300@tu-dresden.de> <50C34FD5.8050702@tu-dresden.de> Message-ID: <50C35894.3050509@tu-dresden.de> Am 08.12.2012 15:37, schrieb Jed Brown: > How about MatPermute() in a suitable place? Not really. I thing, I will try to solve the problem differently and avoid this matrix construction. Thomas > > I don't know why you would need such a thing. > > On Sat, Dec 8, 2012 at 6:33 AM, Thomas Witkowski > > wrote: > > I checked the documentation of MatGetSubMatrix() and found the > following: > > "The rows in isrow will be sorted into the same order as the > original matrix on each process." > > For my case, this will be wrong. I need to say each task exactly > which row from the old matrix should be which row in the new > matrix. Any other possibility to do this? > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jedbrown at mcs.anl.gov Sat Dec 8 09:15:07 2012 From: jedbrown at mcs.anl.gov (Jed Brown) Date: Sat, 8 Dec 2012 07:15:07 -0800 Subject: [petsc-users] Creating explicit matrix scatter In-Reply-To: <50C35894.3050509@tu-dresden.de> References: <50C347B0.3020300@tu-dresden.de> <50C34FD5.8050702@tu-dresden.de> <50C35894.3050509@tu-dresden.de> Message-ID: On Sat, Dec 8, 2012 at 7:11 AM, Thomas Witkowski < thomas.witkowski at tu-dresden.de> wrote: > Am 08.12.2012 15:37, schrieb Jed Brown: > > How about MatPermute() in a suitable place? > > Not really. I thing, It should work... > I will try to solve the problem differently and avoid this matrix > construction. ... but I think this is better. -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.arndt at stud.uni-goettingen.de Tue Dec 11 08:19:14 2012 From: daniel.arndt at stud.uni-goettingen.de (Daniel Arndt) Date: Tue, 11 Dec 2012 15:19:14 +0100 Subject: [petsc-users] early convergence failure Message-ID: <50C740E2.4080105@stud.uni-goettingen.de> Hello everyone, at the moment I'm trying to solve a Poisson problem with SIPG stabilization and discontinuous finite elements. The matrix is constructed in deal.II. When I try to solve this problem with PETSc's CG solver and a BlockJacobi preconditioner or a BoomerAMG preconditioner from the Hypre package I get this weird error message. Exception on processing: Iterative method reported convergence failure in step 3 with residual 1.50616 Aborting! Since the solver is allowed to take 5000 steps this convergence failure is clearly early. Did anyone encounter such an error before? What can produce such an early convergence failure? Thanks in advance, Daniel From bsmith at mcs.anl.gov Tue Dec 11 08:28:52 2012 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 11 Dec 2012 08:28:52 -0600 Subject: [petsc-users] early convergence failure In-Reply-To: <50C740E2.4080105@stud.uni-goettingen.de> References: <50C740E2.4080105@stud.uni-goettingen.de> Message-ID: Daniel, That message is not coming from PETSc so likely Deal.II is processing the result from KSPConvergedReason() and generating that less then totally useful output. If you run with -ksp_converged_reason PETSC will (if Deal.II processes PETSc options correctly) print a more complete reason. Off hand I am guessing that CG detected a non-symmetric or indefinite matrix or preconditioner which it cannot handle so it barfed out. You can run with GMRES instead of CG and if that converges then this is the likely explanation. Barry On Dec 11, 2012, at 8:19 AM, Daniel Arndt wrote: > Hello everyone, > > at the moment I'm trying to solve a Poisson problem with SIPG > stabilization and discontinuous finite elements. The matrix is > constructed in deal.II. When I try to solve this problem with PETSc's CG > solver and a BlockJacobi preconditioner or a BoomerAMG preconditioner > from the Hypre package I get this weird error message. > > Exception on processing: > Iterative method reported convergence failure in step 3 with residual > 1.50616 > Aborting! > > Since the solver is allowed to take 5000 steps this convergence failure > is clearly early. Did anyone encounter such an error before? What can > produce such an early convergence failure? > > Thanks in advance, > Daniel > > > From daniel.arndt at stud.uni-goettingen.de Tue Dec 11 10:00:41 2012 From: daniel.arndt at stud.uni-goettingen.de (Daniel Arndt) Date: Tue, 11 Dec 2012 17:00:41 +0100 Subject: [petsc-users] early convergence failure In-Reply-To: References: Message-ID: <50C758A9.4010408@stud.uni-goettingen.de> Thank you Barry for your suggestions. The error I get is now KSP_DIVERGED_INDEFINITE_PC. The matrix that I try to invert is actually symmetric and positive definite. I was not aware that this can lead to a indefinite preconditioner. If I use a Jacobi preconditioner or tell BoomerAMG that the matrix is symmetric I don't encounter any errors. So I'm quite for now :-) Daniel > Daniel, > > That message is not coming from PETSc so likely Deal.II is processing the result from KSPConvergedReason() and generating that less then totally useful output. If you run with -ksp_converged_reason PETSC will (if Deal.II processes PETSc options correctly) print a more complete reason. > > Off hand I am guessing that CG detected a non-symmetric or indefinite matrix or preconditioner which it cannot handle so it barfed out. You can run with GMRES instead of CG and if that converges then this is the likely explanation. > > Barry > > > On Dec 11, 2012, at 8:19 AM, Daniel Arndt > wrote: > > >/ Hello everyone, > />/ > />/ at the moment I'm trying to solve a Poisson problem with SIPG > />/ stabilization and discontinuous finite elements. The matrix is > />/ constructed in deal.II. When I try to solve this problem with PETSc's CG > />/ solver and a BlockJacobi preconditioner or a BoomerAMG preconditioner > />/ from the Hypre package I get this weird error message. > />/ > />/ Exception on processing: > />/ Iterative method reported convergence failure in step 3 with residual > />/ 1.50616 > />/ Aborting! > />/ > />/ Since the solver is allowed to take 5000 steps this convergence failure > />/ is clearly early. Did anyone encounter such an error before? What can > />/ produce such an early convergence failure? > />/ > />/ Thanks in advance, > />/ Daniel > />/ > />/ > />/ > / -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Tue Dec 11 10:26:54 2012 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 11 Dec 2012 10:26:54 -0600 Subject: [petsc-users] early convergence failure In-Reply-To: <50C758A9.4010408@stud.uni-goettingen.de> References: <50C758A9.4010408@stud.uni-goettingen.de> Message-ID: <7B5C8583-1A21-49C3-B8B3-FF708B84D4F8@mcs.anl.gov> On Dec 11, 2012, at 10:00 AM, Daniel Arndt wrote: > Thank you Barry for your suggestions. > > The error I get is now KSP_DIVERGED_INDEFINITE_PC. The matrix that I try to invert is actually symmetric and positive definite. I was not aware that this can lead to a indefinite preconditioner. Absolutely. Many preconditioners do not retain this feature even in exact precision and with numerical effects it can even appear unexpected. By default BoomAMG doesn't retain this. > If I use a Jacobi preconditioner or tell BoomerAMG that the matrix is symmetric I don't encounter any errors. So I'm quite for now :-) > > Daniel >> Daniel, >> >> That message is not coming from PETSc so likely Deal.II is processing the result from KSPConvergedReason() and generating that less then totally useful output. If you run with -ksp_converged_reason PETSC will (if Deal.II processes PETSc options correctly) print a more complete reason. >> >> Off hand I am guessing that CG detected a non-symmetric or indefinite matrix or preconditioner which it cannot handle so it barfed out. You can run with GMRES instead of CG and if that converges then this is the likely explanation. >> >> Barry >> >> >> On Dec 11, 2012, at 8:19 AM, Daniel Arndt < >> daniel.arndt at stud.uni-goettingen.de >> > wrote: >> >> > >> Hello everyone, >> >> > >> >> >> > >> at the moment I'm trying to solve a Poisson problem with SIPG >> >> > >> stabilization and discontinuous finite elements. The matrix is >> >> > >> constructed in deal.II. When I try to solve this problem with PETSc's CG >> >> > >> solver and a BlockJacobi preconditioner or a BoomerAMG preconditioner >> >> > >> from the Hypre package I get this weird error message. >> >> > >> >> >> > >> Exception on processing: >> >> > >> Iterative method reported convergence failure in step 3 with residual >> >> > >> 1.50616 >> >> > >> Aborting! >> >> > >> >> >> > >> Since the solver is allowed to take 5000 steps this convergence failure >> >> > >> is clearly early. Did anyone encounter such an error before? What can >> >> > >> produce such an early convergence failure? >> >> > >> >> >> > >> Thanks in advance, >> >> > >> Daniel >> >> > >> >> >> > >> >> >> > >> >> >> From ling.zou at inl.gov Tue Dec 11 15:34:01 2012 From: ling.zou at inl.gov (Zou (Non-US), Ling) Date: Tue, 11 Dec 2012 14:34:01 -0700 Subject: [petsc-users] how to control snes_mf_operator Message-ID: Dear All, I have recently had an issue using snes_mf_operator. I've tried to figure it out from PETSc manual and PETSc website but didn't get any luck, so I submit my question here and hope some one could help me out. (1) ================================================================= A little bit background here: my problem has 7 variables, i.e., U = [U0, U1, U2, U3, U4, U5, U6] U0 is in the order of 1. U1, U2, U4 and U5 in the oder of 100. U3 and U6 are in the order of 1.e8. I believe this should be quite common for most PETSc users. (2) ================================================================= My problem here is, U0, by its physical meaning, has to be limited between 0 and 1. When PETSc starts to perturb the initial solution of U (which I believe properly set) to approximate the operation of J (dU), the U0 get a perturbation size in the order of 100, which causes problem as U0 has to be smaller than 1. >From my observation, this same perturbation size, say eps, is applied on all U0, U1, U2, etc. <=== Is this the default setting? I also guess that this eps, in the order of 100, is determined from my initial solution vector and other related PETSc parameters. <=== Is my guessing right? (3) ================================================================= My question: I'd like to avoid a perturbation size ~100 on U0, i.e., I have to limit it to be ~0.01 (or some small number) to avoid the U0 > 1 situation. Is there any way to control that? Or, is there any advanced option to control the perturbation size on different variables when using snes_mf_operator? Hope my explanation is clear. Please let me know if it is not. Best Regards, Ling -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Dec 11 15:40:37 2012 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 11 Dec 2012 13:40:37 -0800 Subject: [petsc-users] how to control snes_mf_operator In-Reply-To: References: Message-ID: On Tue, Dec 11, 2012 at 1:34 PM, Zou (Non-US), Ling wrote: > Dear All, > > I have recently had an issue using snes_mf_operator. I've tried to figure it > out from PETSc manual and PETSc website but didn't get any luck, so I submit > my question here and hope some one could help me out. > > (1) > ================================================================= > A little bit background here: my problem has 7 variables, i.e., > > U = [U0, U1, U2, U3, U4, U5, U6] > > U0 is in the order of 1. > U1, U2, U4 and U5 in the oder of 100. > U3 and U6 are in the order of 1.e8. > > I believe this should be quite common for most PETSc users. > > (2) > ================================================================= > My problem here is, U0, by its physical meaning, has to be limited between 0 > and 1. When PETSc starts to perturb the initial solution of U (which I > believe properly set) to approximate the operation of J (dU), the U0 get a > perturbation size in the order of 100, which causes problem as U0 has to be > smaller than 1. > > From my observation, this same perturbation size, say eps, is applied on all > U0, U1, U2, etc. <=== Is this the default setting? > I also guess that this eps, in the order of 100, is determined from my > initial solution vector and other related PETSc parameters. <=== Is my > guessing right? > > (3) > ================================================================= > My question: I'd like to avoid a perturbation size ~100 on U0, i.e., I have > to limit it to be ~0.01 (or some small number) to avoid the U0 > 1 > situation. Is there any way to control that? > Or, is there any advanced option to control the perturbation size on > different variables when using snes_mf_operator? Here is a description of the algorithm for calculating h. It seems to me a better way to do this is to non-dimensionalize first. Matt > > Hope my explanation is clear. Please let me know if it is not. > > > Best Regards, > > Ling > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener From knepley at gmail.com Tue Dec 11 15:41:00 2012 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 11 Dec 2012 13:41:00 -0800 Subject: [petsc-users] how to control snes_mf_operator In-Reply-To: References: Message-ID: On Tue, Dec 11, 2012 at 1:40 PM, Matthew Knepley wrote: > On Tue, Dec 11, 2012 at 1:34 PM, Zou (Non-US), Ling wrote: >> Dear All, >> >> I have recently had an issue using snes_mf_operator. I've tried to figure it >> out from PETSc manual and PETSc website but didn't get any luck, so I submit >> my question here and hope some one could help me out. >> >> (1) >> ================================================================= >> A little bit background here: my problem has 7 variables, i.e., >> >> U = [U0, U1, U2, U3, U4, U5, U6] >> >> U0 is in the order of 1. >> U1, U2, U4 and U5 in the oder of 100. >> U3 and U6 are in the order of 1.e8. >> >> I believe this should be quite common for most PETSc users. >> >> (2) >> ================================================================= >> My problem here is, U0, by its physical meaning, has to be limited between 0 >> and 1. When PETSc starts to perturb the initial solution of U (which I >> believe properly set) to approximate the operation of J (dU), the U0 get a >> perturbation size in the order of 100, which causes problem as U0 has to be >> smaller than 1. >> >> From my observation, this same perturbation size, say eps, is applied on all >> U0, U1, U2, etc. <=== Is this the default setting? >> I also guess that this eps, in the order of 100, is determined from my >> initial solution vector and other related PETSc parameters. <=== Is my >> guessing right? >> >> (3) >> ================================================================= >> My question: I'd like to avoid a perturbation size ~100 on U0, i.e., I have >> to limit it to be ~0.01 (or some small number) to avoid the U0 > 1 >> situation. Is there any way to control that? >> Or, is there any advanced option to control the perturbation size on >> different variables when using snes_mf_operator? > > Here is a description of the algorithm for calculating h. It seems to > me a better way to do this > is to non-dimensionalize first. I forgot the URL: http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Mat/MatCreateMFFD.html#MatCreateMFFD Matt > Matt > >> >> Hope my explanation is clear. Please let me know if it is not. >> >> >> Best Regards, >> >> Ling >> > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener From ling.zou at inl.gov Tue Dec 11 15:47:08 2012 From: ling.zou at inl.gov (Zou (Non-US), Ling) Date: Tue, 11 Dec 2012 14:47:08 -0700 Subject: [petsc-users] how to control snes_mf_operator In-Reply-To: References: Message-ID: thank you Matt. I will try to figure it out. Non-dimensionalization is certainly something worth to try. Best, Ling On Tue, Dec 11, 2012 at 2:41 PM, Matthew Knepley wrote: > On Tue, Dec 11, 2012 at 1:40 PM, Matthew Knepley > wrote: > > On Tue, Dec 11, 2012 at 1:34 PM, Zou (Non-US), Ling > wrote: > >> Dear All, > >> > >> I have recently had an issue using snes_mf_operator. I've tried to > figure it > >> out from PETSc manual and PETSc website but didn't get any luck, so I > submit > >> my question here and hope some one could help me out. > >> > >> (1) > >> ================================================================= > >> A little bit background here: my problem has 7 variables, i.e., > >> > >> U = [U0, U1, U2, U3, U4, U5, U6] > >> > >> U0 is in the order of 1. > >> U1, U2, U4 and U5 in the oder of 100. > >> U3 and U6 are in the order of 1.e8. > >> > >> I believe this should be quite common for most PETSc users. > >> > >> (2) > >> ================================================================= > >> My problem here is, U0, by its physical meaning, has to be limited > between 0 > >> and 1. When PETSc starts to perturb the initial solution of U (which I > >> believe properly set) to approximate the operation of J (dU), the U0 > get a > >> perturbation size in the order of 100, which causes problem as U0 has > to be > >> smaller than 1. > >> > >> From my observation, this same perturbation size, say eps, is applied > on all > >> U0, U1, U2, etc. <=== Is this the default setting? > >> I also guess that this eps, in the order of 100, is determined from my > >> initial solution vector and other related PETSc parameters. <=== Is my > >> guessing right? > >> > >> (3) > >> ================================================================= > >> My question: I'd like to avoid a perturbation size ~100 on U0, i.e., I > have > >> to limit it to be ~0.01 (or some small number) to avoid the U0 > 1 > >> situation. Is there any way to control that? > >> Or, is there any advanced option to control the perturbation size on > >> different variables when using snes_mf_operator? > > > > Here is a description of the algorithm for calculating h. It seems to > > me a better way to do this > > is to non-dimensionalize first. > > I forgot the URL: > > http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Mat/MatCreateMFFD.html#MatCreateMFFD > > Matt > > > Matt > > > >> > >> Hope my explanation is clear. Please let me know if it is not. > >> > >> > >> Best Regards, > >> > >> Ling > >> > > > > > > > > -- > > What most experimenters take for granted before they begin their > > experiments is infinitely more interesting than any results to which > > their experiments lead. > > -- Norbert Wiener > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ling.zou at inl.gov Tue Dec 11 16:19:31 2012 From: ling.zou at inl.gov (Zou (Non-US), Ling) Date: Tue, 11 Dec 2012 15:19:31 -0700 Subject: [petsc-users] how to control snes_mf_operator In-Reply-To: References: Message-ID: Matt, one more question. Can I combine the options -snes_type test and -mat_mffd_err 1.e-10 to see the effect? Best, Ling On Tue, Dec 11, 2012 at 2:47 PM, Zou (Non-US), Ling wrote: > thank you Matt. I will try to figure it out. Non-dimensionalization is > certainly something worth to try. > > Best, > > Ling > > > On Tue, Dec 11, 2012 at 2:41 PM, Matthew Knepley wrote: > >> On Tue, Dec 11, 2012 at 1:40 PM, Matthew Knepley >> wrote: >> > On Tue, Dec 11, 2012 at 1:34 PM, Zou (Non-US), Ling >> wrote: >> >> Dear All, >> >> >> >> I have recently had an issue using snes_mf_operator. I've tried to >> figure it >> >> out from PETSc manual and PETSc website but didn't get any luck, so I >> submit >> >> my question here and hope some one could help me out. >> >> >> >> (1) >> >> ================================================================= >> >> A little bit background here: my problem has 7 variables, i.e., >> >> >> >> U = [U0, U1, U2, U3, U4, U5, U6] >> >> >> >> U0 is in the order of 1. >> >> U1, U2, U4 and U5 in the oder of 100. >> >> U3 and U6 are in the order of 1.e8. >> >> >> >> I believe this should be quite common for most PETSc users. >> >> >> >> (2) >> >> ================================================================= >> >> My problem here is, U0, by its physical meaning, has to be limited >> between 0 >> >> and 1. When PETSc starts to perturb the initial solution of U (which I >> >> believe properly set) to approximate the operation of J (dU), the U0 >> get a >> >> perturbation size in the order of 100, which causes problem as U0 has >> to be >> >> smaller than 1. >> >> >> >> From my observation, this same perturbation size, say eps, is applied >> on all >> >> U0, U1, U2, etc. <=== Is this the default setting? >> >> I also guess that this eps, in the order of 100, is determined from my >> >> initial solution vector and other related PETSc parameters. <=== Is my >> >> guessing right? >> >> >> >> (3) >> >> ================================================================= >> >> My question: I'd like to avoid a perturbation size ~100 on U0, i.e., I >> have >> >> to limit it to be ~0.01 (or some small number) to avoid the U0 > 1 >> >> situation. Is there any way to control that? >> >> Or, is there any advanced option to control the perturbation size on >> >> different variables when using snes_mf_operator? >> > >> > Here is a description of the algorithm for calculating h. It seems to >> > me a better way to do this >> > is to non-dimensionalize first. >> >> I forgot the URL: >> >> http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Mat/MatCreateMFFD.html#MatCreateMFFD >> >> Matt >> >> > Matt >> > >> >> >> >> Hope my explanation is clear. Please let me know if it is not. >> >> >> >> >> >> Best Regards, >> >> >> >> Ling >> >> >> > >> > >> > >> > -- >> > What most experimenters take for granted before they begin their >> > experiments is infinitely more interesting than any results to which >> > their experiments lead. >> > -- Norbert Wiener >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which >> their experiments lead. >> -- Norbert Wiener >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Dec 11 16:29:33 2012 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 11 Dec 2012 14:29:33 -0800 Subject: [petsc-users] how to control snes_mf_operator In-Reply-To: References: Message-ID: On Tue, Dec 11, 2012 at 2:19 PM, Zou (Non-US), Ling wrote: > Matt, one more question. > > Can I combine the options > -snes_type test > and > -mat_mffd_err 1.e-10 > to see the effect? I do not understand your question. test does compare the analytic and FD Jacobian actions, but I thought you did not have an analytic action. Matt > Best, > > Ling > > > > On Tue, Dec 11, 2012 at 2:47 PM, Zou (Non-US), Ling > wrote: >> >> thank you Matt. I will try to figure it out. Non-dimensionalization is >> certainly something worth to try. >> >> Best, >> >> Ling >> >> >> On Tue, Dec 11, 2012 at 2:41 PM, Matthew Knepley >> wrote: >>> >>> On Tue, Dec 11, 2012 at 1:40 PM, Matthew Knepley >>> wrote: >>> > On Tue, Dec 11, 2012 at 1:34 PM, Zou (Non-US), Ling >>> > wrote: >>> >> Dear All, >>> >> >>> >> I have recently had an issue using snes_mf_operator. I've tried to >>> >> figure it >>> >> out from PETSc manual and PETSc website but didn't get any luck, so I >>> >> submit >>> >> my question here and hope some one could help me out. >>> >> >>> >> (1) >>> >> ================================================================= >>> >> A little bit background here: my problem has 7 variables, i.e., >>> >> >>> >> U = [U0, U1, U2, U3, U4, U5, U6] >>> >> >>> >> U0 is in the order of 1. >>> >> U1, U2, U4 and U5 in the oder of 100. >>> >> U3 and U6 are in the order of 1.e8. >>> >> >>> >> I believe this should be quite common for most PETSc users. >>> >> >>> >> (2) >>> >> ================================================================= >>> >> My problem here is, U0, by its physical meaning, has to be limited >>> >> between 0 >>> >> and 1. When PETSc starts to perturb the initial solution of U (which I >>> >> believe properly set) to approximate the operation of J (dU), the U0 >>> >> get a >>> >> perturbation size in the order of 100, which causes problem as U0 has >>> >> to be >>> >> smaller than 1. >>> >> >>> >> From my observation, this same perturbation size, say eps, is applied >>> >> on all >>> >> U0, U1, U2, etc. <=== Is this the default setting? >>> >> I also guess that this eps, in the order of 100, is determined from my >>> >> initial solution vector and other related PETSc parameters. <=== Is >>> >> my >>> >> guessing right? >>> >> >>> >> (3) >>> >> ================================================================= >>> >> My question: I'd like to avoid a perturbation size ~100 on U0, i.e., I >>> >> have >>> >> to limit it to be ~0.01 (or some small number) to avoid the U0 > 1 >>> >> situation. Is there any way to control that? >>> >> Or, is there any advanced option to control the perturbation size on >>> >> different variables when using snes_mf_operator? >>> > >>> > Here is a description of the algorithm for calculating h. It seems to >>> > me a better way to do this >>> > is to non-dimensionalize first. >>> >>> I forgot the URL: >>> >>> http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Mat/MatCreateMFFD.html#MatCreateMFFD >>> >>> Matt >>> >>> > Matt >>> > >>> >> >>> >> Hope my explanation is clear. Please let me know if it is not. >>> >> >>> >> >>> >> Best Regards, >>> >> >>> >> Ling >>> >> >>> > >>> > >>> > >>> > -- >>> > What most experimenters take for granted before they begin their >>> > experiments is infinitely more interesting than any results to which >>> > their experiments lead. >>> > -- Norbert Wiener >>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which >>> their experiments lead. >>> -- Norbert Wiener >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener From ling.zou at inl.gov Tue Dec 11 16:40:42 2012 From: ling.zou at inl.gov (Zou (Non-US), Ling) Date: Tue, 11 Dec 2012 15:40:42 -0700 Subject: [petsc-users] how to control snes_mf_operator In-Reply-To: References: Message-ID: Hmm... I have an 'approximated' analytical Jacobian to compare. And I did this: ./my-moose-project -i input.i -snes_type test -snes_test_display > out I actually found out that the PETSc provided FD Jacobian gives 'nan' numbers, while my approximated Jacobian does not give 'nan' at the same positions. As we discussed in the previous emails, the perturbation on U0 is too large, which makes 'nan' appear in the FD Jacobians. So....I am trying to use a smaller '-mat_mffd_err ', to see if I could get an easy fix by now, like this, ./my-moose-project -i input.i -snes_type test -md_mffd_err 1.e-10 -snes_test_display > out seems not working :-( no matter what number I give to -md_mffd_err, the print out results seem not changed. But of course, non-dimensionalization might be the ultimate solution. Ling On Tue, Dec 11, 2012 at 3:29 PM, Matthew Knepley wrote: > On Tue, Dec 11, 2012 at 2:19 PM, Zou (Non-US), Ling > wrote: > > Matt, one more question. > > > > Can I combine the options > > -snes_type test > > and > > -mat_mffd_err 1.e-10 > > to see the effect? > > I do not understand your question. test does compare the analytic and > FD Jacobian > actions, but I thought you did not have an analytic action. > > Matt > > > Best, > > > > Ling > > > > > > > > On Tue, Dec 11, 2012 at 2:47 PM, Zou (Non-US), Ling > > wrote: > >> > >> thank you Matt. I will try to figure it out. Non-dimensionalization is > >> certainly something worth to try. > >> > >> Best, > >> > >> Ling > >> > >> > >> On Tue, Dec 11, 2012 at 2:41 PM, Matthew Knepley > >> wrote: > >>> > >>> On Tue, Dec 11, 2012 at 1:40 PM, Matthew Knepley > >>> wrote: > >>> > On Tue, Dec 11, 2012 at 1:34 PM, Zou (Non-US), Ling < > ling.zou at inl.gov> > >>> > wrote: > >>> >> Dear All, > >>> >> > >>> >> I have recently had an issue using snes_mf_operator. I've tried to > >>> >> figure it > >>> >> out from PETSc manual and PETSc website but didn't get any luck, so > I > >>> >> submit > >>> >> my question here and hope some one could help me out. > >>> >> > >>> >> (1) > >>> >> ================================================================= > >>> >> A little bit background here: my problem has 7 variables, i.e., > >>> >> > >>> >> U = [U0, U1, U2, U3, U4, U5, U6] > >>> >> > >>> >> U0 is in the order of 1. > >>> >> U1, U2, U4 and U5 in the oder of 100. > >>> >> U3 and U6 are in the order of 1.e8. > >>> >> > >>> >> I believe this should be quite common for most PETSc users. > >>> >> > >>> >> (2) > >>> >> ================================================================= > >>> >> My problem here is, U0, by its physical meaning, has to be limited > >>> >> between 0 > >>> >> and 1. When PETSc starts to perturb the initial solution of U > (which I > >>> >> believe properly set) to approximate the operation of J (dU), the U0 > >>> >> get a > >>> >> perturbation size in the order of 100, which causes problem as U0 > has > >>> >> to be > >>> >> smaller than 1. > >>> >> > >>> >> From my observation, this same perturbation size, say eps, is > applied > >>> >> on all > >>> >> U0, U1, U2, etc. <=== Is this the default setting? > >>> >> I also guess that this eps, in the order of 100, is determined from > my > >>> >> initial solution vector and other related PETSc parameters. <=== Is > >>> >> my > >>> >> guessing right? > >>> >> > >>> >> (3) > >>> >> ================================================================= > >>> >> My question: I'd like to avoid a perturbation size ~100 on U0, > i.e., I > >>> >> have > >>> >> to limit it to be ~0.01 (or some small number) to avoid the U0 > 1 > >>> >> situation. Is there any way to control that? > >>> >> Or, is there any advanced option to control the perturbation size on > >>> >> different variables when using snes_mf_operator? > >>> > > >>> > Here is a description of the algorithm for calculating h. It seems to > >>> > me a better way to do this > >>> > is to non-dimensionalize first. > >>> > >>> I forgot the URL: > >>> > >>> > http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Mat/MatCreateMFFD.html#MatCreateMFFD > >>> > >>> Matt > >>> > >>> > Matt > >>> > > >>> >> > >>> >> Hope my explanation is clear. Please let me know if it is not. > >>> >> > >>> >> > >>> >> Best Regards, > >>> >> > >>> >> Ling > >>> >> > >>> > > >>> > > >>> > > >>> > -- > >>> > What most experimenters take for granted before they begin their > >>> > experiments is infinitely more interesting than any results to which > >>> > their experiments lead. > >>> > -- Norbert Wiener > >>> > >>> > >>> > >>> -- > >>> What most experimenters take for granted before they begin their > >>> experiments is infinitely more interesting than any results to which > >>> their experiments lead. > >>> -- Norbert Wiener > >> > >> > > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Dec 11 16:50:52 2012 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 11 Dec 2012 14:50:52 -0800 Subject: [petsc-users] how to control snes_mf_operator In-Reply-To: References: Message-ID: On Tue, Dec 11, 2012 at 2:40 PM, Zou (Non-US), Ling wrote: > Hmm... I have an 'approximated' analytical Jacobian to compare. And I did > this: > > ./my-moose-project -i input.i -snes_type test -snes_test_display > out > > I actually found out that the PETSc provided FD Jacobian gives 'nan' > numbers, while my approximated Jacobian does not give 'nan' at the same > positions. > > As we discussed in the previous emails, the perturbation on U0 is too large, > which makes 'nan' appear in the FD Jacobians. So....I am trying to use a > smaller '-mat_mffd_err ', to see if I could get an easy fix by > now, like this, I don't think 'err' has anything to do with it. If you read the page I mailed you, I believe umin can be made very small. Matt > ./my-moose-project -i input.i -snes_type test -md_mffd_err 1.e-10 > -snes_test_display > out > > seems not working :-( > no matter what number I give to -md_mffd_err, the print out results seem not > changed. > > But of course, non-dimensionalization might be the ultimate solution. > > Ling > > On Tue, Dec 11, 2012 at 3:29 PM, Matthew Knepley wrote: >> >> On Tue, Dec 11, 2012 at 2:19 PM, Zou (Non-US), Ling >> wrote: >> > Matt, one more question. >> > >> > Can I combine the options >> > -snes_type test >> > and >> > -mat_mffd_err 1.e-10 >> > to see the effect? >> >> I do not understand your question. test does compare the analytic and >> FD Jacobian >> actions, but I thought you did not have an analytic action. >> >> Matt >> >> > Best, >> > >> > Ling >> > >> > >> > >> > On Tue, Dec 11, 2012 at 2:47 PM, Zou (Non-US), Ling >> > wrote: >> >> >> >> thank you Matt. I will try to figure it out. Non-dimensionalization is >> >> certainly something worth to try. >> >> >> >> Best, >> >> >> >> Ling >> >> >> >> >> >> On Tue, Dec 11, 2012 at 2:41 PM, Matthew Knepley >> >> wrote: >> >>> >> >>> On Tue, Dec 11, 2012 at 1:40 PM, Matthew Knepley >> >>> wrote: >> >>> > On Tue, Dec 11, 2012 at 1:34 PM, Zou (Non-US), Ling >> >>> > >> >>> > wrote: >> >>> >> Dear All, >> >>> >> >> >>> >> I have recently had an issue using snes_mf_operator. I've tried to >> >>> >> figure it >> >>> >> out from PETSc manual and PETSc website but didn't get any luck, so >> >>> >> I >> >>> >> submit >> >>> >> my question here and hope some one could help me out. >> >>> >> >> >>> >> (1) >> >>> >> ================================================================= >> >>> >> A little bit background here: my problem has 7 variables, i.e., >> >>> >> >> >>> >> U = [U0, U1, U2, U3, U4, U5, U6] >> >>> >> >> >>> >> U0 is in the order of 1. >> >>> >> U1, U2, U4 and U5 in the oder of 100. >> >>> >> U3 and U6 are in the order of 1.e8. >> >>> >> >> >>> >> I believe this should be quite common for most PETSc users. >> >>> >> >> >>> >> (2) >> >>> >> ================================================================= >> >>> >> My problem here is, U0, by its physical meaning, has to be limited >> >>> >> between 0 >> >>> >> and 1. When PETSc starts to perturb the initial solution of U >> >>> >> (which I >> >>> >> believe properly set) to approximate the operation of J (dU), the >> >>> >> U0 >> >>> >> get a >> >>> >> perturbation size in the order of 100, which causes problem as U0 >> >>> >> has >> >>> >> to be >> >>> >> smaller than 1. >> >>> >> >> >>> >> From my observation, this same perturbation size, say eps, is >> >>> >> applied >> >>> >> on all >> >>> >> U0, U1, U2, etc. <=== Is this the default setting? >> >>> >> I also guess that this eps, in the order of 100, is determined from >> >>> >> my >> >>> >> initial solution vector and other related PETSc parameters. <=== >> >>> >> Is >> >>> >> my >> >>> >> guessing right? >> >>> >> >> >>> >> (3) >> >>> >> ================================================================= >> >>> >> My question: I'd like to avoid a perturbation size ~100 on U0, >> >>> >> i.e., I >> >>> >> have >> >>> >> to limit it to be ~0.01 (or some small number) to avoid the U0 > 1 >> >>> >> situation. Is there any way to control that? >> >>> >> Or, is there any advanced option to control the perturbation size >> >>> >> on >> >>> >> different variables when using snes_mf_operator? >> >>> > >> >>> > Here is a description of the algorithm for calculating h. It seems >> >>> > to >> >>> > me a better way to do this >> >>> > is to non-dimensionalize first. >> >>> >> >>> I forgot the URL: >> >>> >> >>> >> >>> http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Mat/MatCreateMFFD.html#MatCreateMFFD >> >>> >> >>> Matt >> >>> >> >>> > Matt >> >>> > >> >>> >> >> >>> >> Hope my explanation is clear. Please let me know if it is not. >> >>> >> >> >>> >> >> >>> >> Best Regards, >> >>> >> >> >>> >> Ling >> >>> >> >> >>> > >> >>> > >> >>> > >> >>> > -- >> >>> > What most experimenters take for granted before they begin their >> >>> > experiments is infinitely more interesting than any results to which >> >>> > their experiments lead. >> >>> > -- Norbert Wiener >> >>> >> >>> >> >>> >> >>> -- >> >>> What most experimenters take for granted before they begin their >> >>> experiments is infinitely more interesting than any results to which >> >>> their experiments lead. >> >>> -- Norbert Wiener >> >> >> >> >> > >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which >> their experiments lead. >> -- Norbert Wiener > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener From ling.zou at inl.gov Tue Dec 11 16:59:10 2012 From: ling.zou at inl.gov (Zou (Non-US), Ling) Date: Tue, 11 Dec 2012 15:59:10 -0700 Subject: [petsc-users] how to control snes_mf_operator In-Reply-To: References: Message-ID: ok. I tried. Seems there is no effect. ./my-moose-project -i input.i -snes_type test -mat_mffd_umin 1.e-10 -snes_test_display > out Also, the webpage says: *-mat_mffd_unim *I am not quite sure if 'unim' is a typo. I tried both 'umin' and 'unim' anyway. Ling On Tue, Dec 11, 2012 at 3:50 PM, Matthew Knepley wrote: > On Tue, Dec 11, 2012 at 2:40 PM, Zou (Non-US), Ling > wrote: > > Hmm... I have an 'approximated' analytical Jacobian to compare. And I did > > this: > > > > ./my-moose-project -i input.i -snes_type test -snes_test_display > out > > > > I actually found out that the PETSc provided FD Jacobian gives 'nan' > > numbers, while my approximated Jacobian does not give 'nan' at the same > > positions. > > > > As we discussed in the previous emails, the perturbation on U0 is too > large, > > which makes 'nan' appear in the FD Jacobians. So....I am trying to use a > > smaller '-mat_mffd_err ', to see if I could get an easy fix > by > > now, like this, > > I don't think 'err' has anything to do with it. If you read the page I > mailed you, I > believe umin can be made very small. > > Matt > > > ./my-moose-project -i input.i -snes_type test -md_mffd_err 1.e-10 > > -snes_test_display > out > > > > seems not working :-( > > no matter what number I give to -md_mffd_err, the print out results seem > not > > changed. > > > > But of course, non-dimensionalization might be the ultimate solution. > > > > Ling > > > > On Tue, Dec 11, 2012 at 3:29 PM, Matthew Knepley > wrote: > >> > >> On Tue, Dec 11, 2012 at 2:19 PM, Zou (Non-US), Ling > >> wrote: > >> > Matt, one more question. > >> > > >> > Can I combine the options > >> > -snes_type test > >> > and > >> > -mat_mffd_err 1.e-10 > >> > to see the effect? > >> > >> I do not understand your question. test does compare the analytic and > >> FD Jacobian > >> actions, but I thought you did not have an analytic action. > >> > >> Matt > >> > >> > Best, > >> > > >> > Ling > >> > > >> > > >> > > >> > On Tue, Dec 11, 2012 at 2:47 PM, Zou (Non-US), Ling > > >> > wrote: > >> >> > >> >> thank you Matt. I will try to figure it out. Non-dimensionalization > is > >> >> certainly something worth to try. > >> >> > >> >> Best, > >> >> > >> >> Ling > >> >> > >> >> > >> >> On Tue, Dec 11, 2012 at 2:41 PM, Matthew Knepley > >> >> wrote: > >> >>> > >> >>> On Tue, Dec 11, 2012 at 1:40 PM, Matthew Knepley > > >> >>> wrote: > >> >>> > On Tue, Dec 11, 2012 at 1:34 PM, Zou (Non-US), Ling > >> >>> > > >> >>> > wrote: > >> >>> >> Dear All, > >> >>> >> > >> >>> >> I have recently had an issue using snes_mf_operator. I've tried > to > >> >>> >> figure it > >> >>> >> out from PETSc manual and PETSc website but didn't get any luck, > so > >> >>> >> I > >> >>> >> submit > >> >>> >> my question here and hope some one could help me out. > >> >>> >> > >> >>> >> (1) > >> >>> >> ================================================================= > >> >>> >> A little bit background here: my problem has 7 variables, i.e., > >> >>> >> > >> >>> >> U = [U0, U1, U2, U3, U4, U5, U6] > >> >>> >> > >> >>> >> U0 is in the order of 1. > >> >>> >> U1, U2, U4 and U5 in the oder of 100. > >> >>> >> U3 and U6 are in the order of 1.e8. > >> >>> >> > >> >>> >> I believe this should be quite common for most PETSc users. > >> >>> >> > >> >>> >> (2) > >> >>> >> ================================================================= > >> >>> >> My problem here is, U0, by its physical meaning, has to be > limited > >> >>> >> between 0 > >> >>> >> and 1. When PETSc starts to perturb the initial solution of U > >> >>> >> (which I > >> >>> >> believe properly set) to approximate the operation of J (dU), the > >> >>> >> U0 > >> >>> >> get a > >> >>> >> perturbation size in the order of 100, which causes problem as U0 > >> >>> >> has > >> >>> >> to be > >> >>> >> smaller than 1. > >> >>> >> > >> >>> >> From my observation, this same perturbation size, say eps, is > >> >>> >> applied > >> >>> >> on all > >> >>> >> U0, U1, U2, etc. <=== Is this the default setting? > >> >>> >> I also guess that this eps, in the order of 100, is determined > from > >> >>> >> my > >> >>> >> initial solution vector and other related PETSc parameters. <=== > >> >>> >> Is > >> >>> >> my > >> >>> >> guessing right? > >> >>> >> > >> >>> >> (3) > >> >>> >> ================================================================= > >> >>> >> My question: I'd like to avoid a perturbation size ~100 on U0, > >> >>> >> i.e., I > >> >>> >> have > >> >>> >> to limit it to be ~0.01 (or some small number) to avoid the U0 > > 1 > >> >>> >> situation. Is there any way to control that? > >> >>> >> Or, is there any advanced option to control the perturbation size > >> >>> >> on > >> >>> >> different variables when using snes_mf_operator? > >> >>> > > >> >>> > Here is a description of the algorithm for calculating h. It seems > >> >>> > to > >> >>> > me a better way to do this > >> >>> > is to non-dimensionalize first. > >> >>> > >> >>> I forgot the URL: > >> >>> > >> >>> > >> >>> > http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Mat/MatCreateMFFD.html#MatCreateMFFD > >> >>> > >> >>> Matt > >> >>> > >> >>> > Matt > >> >>> > > >> >>> >> > >> >>> >> Hope my explanation is clear. Please let me know if it is not. > >> >>> >> > >> >>> >> > >> >>> >> Best Regards, > >> >>> >> > >> >>> >> Ling > >> >>> >> > >> >>> > > >> >>> > > >> >>> > > >> >>> > -- > >> >>> > What most experimenters take for granted before they begin their > >> >>> > experiments is infinitely more interesting than any results to > which > >> >>> > their experiments lead. > >> >>> > -- Norbert Wiener > >> >>> > >> >>> > >> >>> > >> >>> -- > >> >>> What most experimenters take for granted before they begin their > >> >>> experiments is infinitely more interesting than any results to which > >> >>> their experiments lead. > >> >>> -- Norbert Wiener > >> >> > >> >> > >> > > >> > >> > >> > >> -- > >> What most experimenters take for granted before they begin their > >> experiments is infinitely more interesting than any results to which > >> their experiments lead. > >> -- Norbert Wiener > > > > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Dec 11 17:02:14 2012 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 11 Dec 2012 15:02:14 -0800 Subject: [petsc-users] how to control snes_mf_operator In-Reply-To: References: Message-ID: On Tue, Dec 11, 2012 at 2:59 PM, Zou (Non-US), Ling wrote: > ok. I tried. Seems there is no effect. > > ./my-moose-project -i input.i -snes_type test -mat_mffd_umin > 1.e-10 -snes_test_display > out > > Also, the webpage says: > *-mat_mffd_unim > > *I am not quite sure if 'unim' is a typo. I tried both 'umin' and 'unim' > anyway. > You can check what is coming in, right? But this is all academic, with that scaling, you will get almost no significant figures in the Jacobian for those unknowns, so why worry about it. Nondimensionalize. Matt > Ling > > > > On Tue, Dec 11, 2012 at 3:50 PM, Matthew Knepley wrote: > >> On Tue, Dec 11, 2012 at 2:40 PM, Zou (Non-US), Ling >> wrote: >> > Hmm... I have an 'approximated' analytical Jacobian to compare. And I >> did >> > this: >> > >> > ./my-moose-project -i input.i -snes_type test -snes_test_display > out >> > >> > I actually found out that the PETSc provided FD Jacobian gives 'nan' >> > numbers, while my approximated Jacobian does not give 'nan' at the same >> > positions. >> > >> > As we discussed in the previous emails, the perturbation on U0 is too >> large, >> > which makes 'nan' appear in the FD Jacobians. So....I am trying to use a >> > smaller '-mat_mffd_err ', to see if I could get an easy >> fix by >> > now, like this, >> >> I don't think 'err' has anything to do with it. If you read the page I >> mailed you, I >> believe umin can be made very small. >> >> Matt >> >> > ./my-moose-project -i input.i -snes_type test -md_mffd_err 1.e-10 >> > -snes_test_display > out >> > >> > seems not working :-( >> > no matter what number I give to -md_mffd_err, the print out results >> seem not >> > changed. >> > >> > But of course, non-dimensionalization might be the ultimate solution. >> > >> > Ling >> > >> > On Tue, Dec 11, 2012 at 3:29 PM, Matthew Knepley >> wrote: >> >> >> >> On Tue, Dec 11, 2012 at 2:19 PM, Zou (Non-US), Ling >> >> wrote: >> >> > Matt, one more question. >> >> > >> >> > Can I combine the options >> >> > -snes_type test >> >> > and >> >> > -mat_mffd_err 1.e-10 >> >> > to see the effect? >> >> >> >> I do not understand your question. test does compare the analytic and >> >> FD Jacobian >> >> actions, but I thought you did not have an analytic action. >> >> >> >> Matt >> >> >> >> > Best, >> >> > >> >> > Ling >> >> > >> >> > >> >> > >> >> > On Tue, Dec 11, 2012 at 2:47 PM, Zou (Non-US), Ling < >> ling.zou at inl.gov> >> >> > wrote: >> >> >> >> >> >> thank you Matt. I will try to figure it out. Non-dimensionalization >> is >> >> >> certainly something worth to try. >> >> >> >> >> >> Best, >> >> >> >> >> >> Ling >> >> >> >> >> >> >> >> >> On Tue, Dec 11, 2012 at 2:41 PM, Matthew Knepley > > >> >> >> wrote: >> >> >>> >> >> >>> On Tue, Dec 11, 2012 at 1:40 PM, Matthew Knepley < >> knepley at gmail.com> >> >> >>> wrote: >> >> >>> > On Tue, Dec 11, 2012 at 1:34 PM, Zou (Non-US), Ling >> >> >>> > >> >> >>> > wrote: >> >> >>> >> Dear All, >> >> >>> >> >> >> >>> >> I have recently had an issue using snes_mf_operator. I've tried >> to >> >> >>> >> figure it >> >> >>> >> out from PETSc manual and PETSc website but didn't get any >> luck, so >> >> >>> >> I >> >> >>> >> submit >> >> >>> >> my question here and hope some one could help me out. >> >> >>> >> >> >> >>> >> (1) >> >> >>> >> >> ================================================================= >> >> >>> >> A little bit background here: my problem has 7 variables, i.e., >> >> >>> >> >> >> >>> >> U = [U0, U1, U2, U3, U4, U5, U6] >> >> >>> >> >> >> >>> >> U0 is in the order of 1. >> >> >>> >> U1, U2, U4 and U5 in the oder of 100. >> >> >>> >> U3 and U6 are in the order of 1.e8. >> >> >>> >> >> >> >>> >> I believe this should be quite common for most PETSc users. >> >> >>> >> >> >> >>> >> (2) >> >> >>> >> >> ================================================================= >> >> >>> >> My problem here is, U0, by its physical meaning, has to be >> limited >> >> >>> >> between 0 >> >> >>> >> and 1. When PETSc starts to perturb the initial solution of U >> >> >>> >> (which I >> >> >>> >> believe properly set) to approximate the operation of J (dU), >> the >> >> >>> >> U0 >> >> >>> >> get a >> >> >>> >> perturbation size in the order of 100, which causes problem as >> U0 >> >> >>> >> has >> >> >>> >> to be >> >> >>> >> smaller than 1. >> >> >>> >> >> >> >>> >> From my observation, this same perturbation size, say eps, is >> >> >>> >> applied >> >> >>> >> on all >> >> >>> >> U0, U1, U2, etc. <=== Is this the default setting? >> >> >>> >> I also guess that this eps, in the order of 100, is determined >> from >> >> >>> >> my >> >> >>> >> initial solution vector and other related PETSc parameters. >> <=== >> >> >>> >> Is >> >> >>> >> my >> >> >>> >> guessing right? >> >> >>> >> >> >> >>> >> (3) >> >> >>> >> >> ================================================================= >> >> >>> >> My question: I'd like to avoid a perturbation size ~100 on U0, >> >> >>> >> i.e., I >> >> >>> >> have >> >> >>> >> to limit it to be ~0.01 (or some small number) to avoid the U0 >> > 1 >> >> >>> >> situation. Is there any way to control that? >> >> >>> >> Or, is there any advanced option to control the perturbation >> size >> >> >>> >> on >> >> >>> >> different variables when using snes_mf_operator? >> >> >>> > >> >> >>> > Here is a description of the algorithm for calculating h. It >> seems >> >> >>> > to >> >> >>> > me a better way to do this >> >> >>> > is to non-dimensionalize first. >> >> >>> >> >> >>> I forgot the URL: >> >> >>> >> >> >>> >> >> >>> >> http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Mat/MatCreateMFFD.html#MatCreateMFFD >> >> >>> >> >> >>> Matt >> >> >>> >> >> >>> > Matt >> >> >>> > >> >> >>> >> >> >> >>> >> Hope my explanation is clear. Please let me know if it is not. >> >> >>> >> >> >> >>> >> >> >> >>> >> Best Regards, >> >> >>> >> >> >> >>> >> Ling >> >> >>> >> >> >> >>> > >> >> >>> > >> >> >>> > >> >> >>> > -- >> >> >>> > What most experimenters take for granted before they begin their >> >> >>> > experiments is infinitely more interesting than any results to >> which >> >> >>> > their experiments lead. >> >> >>> > -- Norbert Wiener >> >> >>> >> >> >>> >> >> >>> >> >> >>> -- >> >> >>> What most experimenters take for granted before they begin their >> >> >>> experiments is infinitely more interesting than any results to >> which >> >> >>> their experiments lead. >> >> >>> -- Norbert Wiener >> >> >> >> >> >> >> >> > >> >> >> >> >> >> >> >> -- >> >> What most experimenters take for granted before they begin their >> >> experiments is infinitely more interesting than any results to which >> >> their experiments lead. >> >> -- Norbert Wiener >> > >> > >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which >> their experiments lead. >> -- Norbert Wiener >> > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From ling.zou at inl.gov Tue Dec 11 17:54:56 2012 From: ling.zou at inl.gov (Zou (Non-US), Ling) Date: Tue, 11 Dec 2012 16:54:56 -0700 Subject: [petsc-users] how to control snes_mf_operator In-Reply-To: References: Message-ID: Thank you Matt. Ling On Tue, Dec 11, 2012 at 4:02 PM, Matthew Knepley wrote: > On Tue, Dec 11, 2012 at 2:59 PM, Zou (Non-US), Ling wrote: > >> ok. I tried. Seems there is no effect. >> >> ./my-moose-project -i input.i -snes_type test -mat_mffd_umin >> 1.e-10 -snes_test_display > out >> >> Also, the webpage says: >> *-mat_mffd_unim >> >> *I am not quite sure if 'unim' is a typo. I tried both 'umin' and 'unim' >> anyway. >> > > You can check what is coming in, right? But this is all academic, with > that scaling, you will get almost > no significant figures in the Jacobian for those unknowns, so why worry > about it. Nondimensionalize. > > Matt > > >> Ling >> >> >> >> On Tue, Dec 11, 2012 at 3:50 PM, Matthew Knepley wrote: >> >>> On Tue, Dec 11, 2012 at 2:40 PM, Zou (Non-US), Ling >>> wrote: >>> > Hmm... I have an 'approximated' analytical Jacobian to compare. And I >>> did >>> > this: >>> > >>> > ./my-moose-project -i input.i -snes_type test -snes_test_display > out >>> > >>> > I actually found out that the PETSc provided FD Jacobian gives 'nan' >>> > numbers, while my approximated Jacobian does not give 'nan' at the same >>> > positions. >>> > >>> > As we discussed in the previous emails, the perturbation on U0 is too >>> large, >>> > which makes 'nan' appear in the FD Jacobians. So....I am trying to use >>> a >>> > smaller '-mat_mffd_err ', to see if I could get an easy >>> fix by >>> > now, like this, >>> >>> I don't think 'err' has anything to do with it. If you read the page I >>> mailed you, I >>> believe umin can be made very small. >>> >>> Matt >>> >>> > ./my-moose-project -i input.i -snes_type test -md_mffd_err 1.e-10 >>> > -snes_test_display > out >>> > >>> > seems not working :-( >>> > no matter what number I give to -md_mffd_err, the print out results >>> seem not >>> > changed. >>> > >>> > But of course, non-dimensionalization might be the ultimate solution. >>> > >>> > Ling >>> > >>> > On Tue, Dec 11, 2012 at 3:29 PM, Matthew Knepley >>> wrote: >>> >> >>> >> On Tue, Dec 11, 2012 at 2:19 PM, Zou (Non-US), Ling >> > >>> >> wrote: >>> >> > Matt, one more question. >>> >> > >>> >> > Can I combine the options >>> >> > -snes_type test >>> >> > and >>> >> > -mat_mffd_err 1.e-10 >>> >> > to see the effect? >>> >> >>> >> I do not understand your question. test does compare the analytic and >>> >> FD Jacobian >>> >> actions, but I thought you did not have an analytic action. >>> >> >>> >> Matt >>> >> >>> >> > Best, >>> >> > >>> >> > Ling >>> >> > >>> >> > >>> >> > >>> >> > On Tue, Dec 11, 2012 at 2:47 PM, Zou (Non-US), Ling < >>> ling.zou at inl.gov> >>> >> > wrote: >>> >> >> >>> >> >> thank you Matt. I will try to figure it out. >>> Non-dimensionalization is >>> >> >> certainly something worth to try. >>> >> >> >>> >> >> Best, >>> >> >> >>> >> >> Ling >>> >> >> >>> >> >> >>> >> >> On Tue, Dec 11, 2012 at 2:41 PM, Matthew Knepley < >>> knepley at gmail.com> >>> >> >> wrote: >>> >> >>> >>> >> >>> On Tue, Dec 11, 2012 at 1:40 PM, Matthew Knepley < >>> knepley at gmail.com> >>> >> >>> wrote: >>> >> >>> > On Tue, Dec 11, 2012 at 1:34 PM, Zou (Non-US), Ling >>> >> >>> > >>> >> >>> > wrote: >>> >> >>> >> Dear All, >>> >> >>> >> >>> >> >>> >> I have recently had an issue using snes_mf_operator. I've >>> tried to >>> >> >>> >> figure it >>> >> >>> >> out from PETSc manual and PETSc website but didn't get any >>> luck, so >>> >> >>> >> I >>> >> >>> >> submit >>> >> >>> >> my question here and hope some one could help me out. >>> >> >>> >> >>> >> >>> >> (1) >>> >> >>> >> >>> ================================================================= >>> >> >>> >> A little bit background here: my problem has 7 variables, i.e., >>> >> >>> >> >>> >> >>> >> U = [U0, U1, U2, U3, U4, U5, U6] >>> >> >>> >> >>> >> >>> >> U0 is in the order of 1. >>> >> >>> >> U1, U2, U4 and U5 in the oder of 100. >>> >> >>> >> U3 and U6 are in the order of 1.e8. >>> >> >>> >> >>> >> >>> >> I believe this should be quite common for most PETSc users. >>> >> >>> >> >>> >> >>> >> (2) >>> >> >>> >> >>> ================================================================= >>> >> >>> >> My problem here is, U0, by its physical meaning, has to be >>> limited >>> >> >>> >> between 0 >>> >> >>> >> and 1. When PETSc starts to perturb the initial solution of U >>> >> >>> >> (which I >>> >> >>> >> believe properly set) to approximate the operation of J (dU), >>> the >>> >> >>> >> U0 >>> >> >>> >> get a >>> >> >>> >> perturbation size in the order of 100, which causes problem as >>> U0 >>> >> >>> >> has >>> >> >>> >> to be >>> >> >>> >> smaller than 1. >>> >> >>> >> >>> >> >>> >> From my observation, this same perturbation size, say eps, is >>> >> >>> >> applied >>> >> >>> >> on all >>> >> >>> >> U0, U1, U2, etc. <=== Is this the default setting? >>> >> >>> >> I also guess that this eps, in the order of 100, is determined >>> from >>> >> >>> >> my >>> >> >>> >> initial solution vector and other related PETSc parameters. >>> <=== >>> >> >>> >> Is >>> >> >>> >> my >>> >> >>> >> guessing right? >>> >> >>> >> >>> >> >>> >> (3) >>> >> >>> >> >>> ================================================================= >>> >> >>> >> My question: I'd like to avoid a perturbation size ~100 on U0, >>> >> >>> >> i.e., I >>> >> >>> >> have >>> >> >>> >> to limit it to be ~0.01 (or some small number) to avoid the U0 >>> > 1 >>> >> >>> >> situation. Is there any way to control that? >>> >> >>> >> Or, is there any advanced option to control the perturbation >>> size >>> >> >>> >> on >>> >> >>> >> different variables when using snes_mf_operator? >>> >> >>> > >>> >> >>> > Here is a description of the algorithm for calculating h. It >>> seems >>> >> >>> > to >>> >> >>> > me a better way to do this >>> >> >>> > is to non-dimensionalize first. >>> >> >>> >>> >> >>> I forgot the URL: >>> >> >>> >>> >> >>> >>> >> >>> >>> http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Mat/MatCreateMFFD.html#MatCreateMFFD >>> >> >>> >>> >> >>> Matt >>> >> >>> >>> >> >>> > Matt >>> >> >>> > >>> >> >>> >> >>> >> >>> >> Hope my explanation is clear. Please let me know if it is not. >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> Best Regards, >>> >> >>> >> >>> >> >>> >> Ling >>> >> >>> >> >>> >> >>> > >>> >> >>> > >>> >> >>> > >>> >> >>> > -- >>> >> >>> > What most experimenters take for granted before they begin their >>> >> >>> > experiments is infinitely more interesting than any results to >>> which >>> >> >>> > their experiments lead. >>> >> >>> > -- Norbert Wiener >>> >> >>> >>> >> >>> >>> >> >>> >>> >> >>> -- >>> >> >>> What most experimenters take for granted before they begin their >>> >> >>> experiments is infinitely more interesting than any results to >>> which >>> >> >>> their experiments lead. >>> >> >>> -- Norbert Wiener >>> >> >> >>> >> >> >>> >> > >>> >> >>> >> >>> >> >>> >> -- >>> >> What most experimenters take for granted before they begin their >>> >> experiments is infinitely more interesting than any results to which >>> >> their experiments lead. >>> >> -- Norbert Wiener >>> > >>> > >>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which >>> their experiments lead. >>> -- Norbert Wiener >>> >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.guterres at gmail.com Thu Dec 13 04:55:10 2012 From: m.guterres at gmail.com (Marcelo Guterres) Date: Thu, 13 Dec 2012 08:55:10 -0200 Subject: [petsc-users] question of a Brazilian student about the use of "PETSC" in a cluster Message-ID: <50C9B40E.5090101@gmail.com> Hello, My name is Marcelo Guterres and I am PhD student in Brazil. I use the PETSC in a cluster with 11 computers, each with 8 processors "Xeon 2.8GHz", "16GB RAM" and "4 HD SAS" from "146GB". My question is about the PETSc following functions: / -> Ierr = MPI_Comm_rank (MPI_COMM_WORLD, & rank); CHKERRQ (ierr); -> Ierr = MPI_Comm_size (MPI_COMM_WORLD, & size); CHKERRQ (ierr);/ For example, using only MPI: ---------------------------------------------------------------------------------------------- // Program hello word using MPI #include using namespace std; #include int main(int argc, char *argv[]) { int size, rank; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD,&size); MPI_Comm_rank(MPI_COMM_WORLD,&rank); cout << "hello. I am process " << rank << " of " << size << endl; if ( rank == 0) { cout << "\nFinish !!" << endl; } MPI_Finalize(); } ***** running the command has the following output:* [guterres at stratus hello]$ mpirun -np 3 ./hello hello. I am process 1 of 3 hello. I am process 0 of 3 Finish !! hello. I am process 2 of 3 CONCLUSION: int size = 3; ---------------------------------------------------------------------------------------------- using only the PETSC ---------------------------------------------------------------------------------------------- static char help[] ="\n\n hello word PETSC !!"; #include #include using namespace std; int main( int argc, char *argv[] ) { PetscErrorCode ierr; PetscMPIInt size, rank; ierr = PetscInitialize(&argc,&argv,(char *)0,help);CHKERRQ(ierr); ierr = MPI_Comm_rank(MPI_COMM_WORLD,&rank); CHKERRQ(ierr); ierr = MPI_Comm_size(MPI_COMM_WORLD,&size); CHKERRQ(ierr); cout << "hello. I am process " << rank << " of " << size << endl; if ( rank == 0) { cout << "\nfinish !!" << endl; } ierr = PetscFinalize( ); CHKERRQ(ierr); return 0; } ***** running the command has the following output:* [guterres at stratus hello_petsc]$ mpirun -np 3 ./hello hello. I am process 0 of 1 finish !! hello. I am process 0 of 1 finish !! hello. I am process 0 of 1 finish !! ---------------------------------------------------------------------------------------------- *MY QUESTION IS:* THE OUTPUT OF THE PROGRAM WITH PETSC CORRECT??? the variable value PetscMPIInt size = np ?? The correct output should not be: [guterres at stratus hello_petsc]$ mpirun -np 3 ./hello hello. I am process 0 of 3 hello. I am process 1 of 3 hello. I am process 2 of 3 finish !! CONCLUSION: PetscMPIInt size = 3 ?? Thank you for your attention and excuse my writing in English. Marcelo Guterres -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Thu Dec 13 07:23:59 2012 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 13 Dec 2012 07:23:59 -0600 Subject: [petsc-users] question of a Brazilian student about the use of "PETSC" in a cluster In-Reply-To: <50C9B40E.5090101@gmail.com> References: <50C9B40E.5090101@gmail.com> Message-ID: <2B738BAF-509D-42DF-8523-69E9A2D8D32F@mcs.anl.gov> > Marcelo, There is something wrong with the PETSc install. Are you absolutely sure that PETSc was ./configure with the same MPI as as the mpirun that you use to launch the program? You can send configure.log to petsc-maint at mcs.anl.gov if you cannot figure it out on your own. Barry On Dec 13, 2012, at 4:55 AM, Marcelo Guterres wrote: > Hello, > > My name is Marcelo Guterres and I am PhD student in Brazil. > > I use the PETSC in a cluster with 11 computers, each with 8 processors "Xeon 2.8GHz", "16GB RAM" and "4 HD SAS" from "146GB". > > My question is about the PETSc following functions: > > -> Ierr = MPI_Comm_rank (MPI_COMM_WORLD, & rank); CHKERRQ (ierr); > -> Ierr = MPI_Comm_size (MPI_COMM_WORLD, & size); CHKERRQ (ierr); > > > For example, using only MPI: > > ---------------------------------------------------------------------------------------------- > > // Program hello word using MPI > > #include > using namespace std; > #include > > int main(int argc, char *argv[]) > > { > int size, rank; > MPI_Init(&argc, &argv); > MPI_Comm_size(MPI_COMM_WORLD,&size); > MPI_Comm_rank(MPI_COMM_WORLD,&rank); > > cout << "hello. I am process " << rank << " of " << size << endl; > > if ( rank == 0) > { > cout << "\nFinish !!" << endl; > } > > MPI_Finalize(); > } > > > > **** running the command has the following output: > > [guterres at stratus hello]$ mpirun -np 3 ./hello > > hello. I am process 1 of 3 > hello. I am process 0 of 3 > > Finish !! > hello. I am process 2 of 3 > > > CONCLUSION: int size = 3; > > ---------------------------------------------------------------------------------------------- > > > using only the PETSC > > ---------------------------------------------------------------------------------------------- > > > static char help[] ="\n\n hello word PETSC !!"; > #include > #include > using namespace std; > > int main( int argc, char *argv[] ) > { > PetscErrorCode ierr; > > PetscMPIInt size, > rank; > > ierr = PetscInitialize(&argc,&argv,(char *)0,help);CHKERRQ(ierr); > ierr = MPI_Comm_rank(MPI_COMM_WORLD,&rank); CHKERRQ(ierr); > ierr = MPI_Comm_size(MPI_COMM_WORLD,&size); CHKERRQ(ierr); > > cout << "hello. I am process " << rank << " of " << size << endl; > > if ( rank == 0) > { > cout << "\nfinish !!" << endl; > } > > > ierr = PetscFinalize( ); CHKERRQ(ierr); > return 0; > } > > > **** running the command has the following output: > > > [guterres at stratus hello_petsc]$ mpirun -np 3 ./hello > > hello. I am process 0 of 1 > finish !! > > hello. I am process 0 of 1 > finish !! > > hello. I am process 0 of 1 > finish !! > > > ---------------------------------------------------------------------------------------------- > > > MY QUESTION IS: > > > THE OUTPUT OF THE PROGRAM WITH PETSC CORRECT??? > > the variable value PetscMPIInt size = np ?? > > The correct output should not be: > > > [guterres at stratus hello_petsc]$ mpirun -np 3 ./hello > > hello. I am process 0 of 3 > > hello. I am process 1 of 3 > > hello. I am process 2 of 3 > > finish !! > > CONCLUSION: PetscMPIInt size = 3 ?? > > > Thank you for your attention and excuse my writing in English. > > > Marcelo Guterres From thomas.witkowski at tu-dresden.de Thu Dec 13 09:41:33 2012 From: thomas.witkowski at tu-dresden.de (Thomas Witkowski) Date: Thu, 13 Dec 2012 16:41:33 +0100 Subject: [petsc-users] Some tricky problem in my multilevel feti dp code Message-ID: <50C9F72D.2010006@tu-dresden.de> I have some problem in the implementation of my multilevel FETI DP code, where two block structured matrices must be multiplied. I'll give my best to explain the problem, may be one of you have an idea how to implement it. I think, the best is to make a small example: lets assume we have 16 subdomains, uniformly subdividing a unit square. Each of the subdomain matrices is purely local, thus they have the communicator PETSC_COMM_SELF. Each of them is of size n x n. There is a coarse grid matrix, with communicator PETSC_COMM_WORLD and of size m x m. The coupling matrices between the global coarse grid and the local matrices are also global, so they are of size 16n x m and m x 16n, respectively. So far, everything is fine and works perfectly. Now I introduce four "local coarse grids", each of them couples four local subdomains, and is defined on a subset communicator of PETSC_COMM_WORLD. Say, each "local coarse grid" matrix is of size p x p, and there are also coupling matrices of size 4n x p and p x 4n. Now I have to perform a MatMatMult of the local coarse coupling matrices p x 4n with the global coupling matrix 16n x m. So the final matrix is of size 4p x m. But I cannot perform the MatMatMult, as the matrix sizes do not fit and the communicators are not compatible. Is it possible to understand, what I want to do? :) Any idea, how to implement it? Thomas From knepley at gmail.com Thu Dec 13 10:43:36 2012 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 13 Dec 2012 08:43:36 -0800 Subject: [petsc-users] Some tricky problem in my multilevel feti dp code In-Reply-To: <50C9F72D.2010006@tu-dresden.de> References: <50C9F72D.2010006@tu-dresden.de> Message-ID: On Thu, Dec 13, 2012 at 7:41 AM, Thomas Witkowski wrote: > I have some problem in the implementation of my multilevel FETI DP code, > where two block structured matrices must be multiplied. I'll give my best to > explain the problem, may be one of you have an idea how to implement it. I > think, the best is to make a small example: lets assume we have 16 > subdomains, uniformly subdividing a unit square. Each of the subdomain > matrices is purely local, thus they have the communicator PETSC_COMM_SELF. > Each of them is of size n x n. There is a coarse grid matrix, with > communicator PETSC_COMM_WORLD and of size m x m. The coupling matrices > between the global coarse grid and the local matrices are also global, so > they are of size 16n x m and m x 16n, respectively. So far, everything is > fine and works perfectly. Now I introduce four "local coarse grids", each of > them couples four local subdomains, and is defined on a subset communicator > of PETSC_COMM_WORLD. Say, each "local coarse grid" matrix is of size p x p, > and there are also coupling matrices of size 4n x p and p x 4n. Now I have > to perform a MatMatMult of the local coarse coupling matrices p x 4n with > the global coupling matrix 16n x m. So the final matrix is of size 4p x m. > But I cannot perform the MatMatMult, as the matrix sizes do not fit and the > communicators are not compatible. > > Is it possible to understand, what I want to do? :) Any idea, how to > implement it? It sounds like you need to redistribute the matrix before the MatMatMult. I think you can do this with MatGetSubmatrix(), if I understand your problem correctly. You probably need to move the matrix from the subcomm to the global comm first, with empty entries on some procs. I would just do it the naive way first, then profile to see how it does. Matt > Thomas -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener From mark.adams at columbia.edu Thu Dec 13 11:22:41 2012 From: mark.adams at columbia.edu (Mark F. Adams) Date: Thu, 13 Dec 2012 12:22:41 -0500 Subject: [petsc-users] Some tricky problem in my multilevel feti dp code In-Reply-To: References: <50C9F72D.2010006@tu-dresden.de> Message-ID: <1F23D79C-569C-405B-A68C-BAB9E310C2D3@columbia.edu> You might also be able to put everything in a global comm. Then you have a bunch of block diagonal matrix ops. There is no performance penalty other then in reductions (e.g., when PETSc figures out its scatter stuff) and you have to keep track of where the local problem "starts" in the global matrix, but it might be simpler. Note, your local LU solves will now be a global looking block Jacobi with a sub LU solver, but its the same thing. On Dec 13, 2012, at 11:43 AM, Matthew Knepley wrote: > On Thu, Dec 13, 2012 at 7:41 AM, Thomas Witkowski > wrote: >> I have some problem in the implementation of my multilevel FETI DP code, >> where two block structured matrices must be multiplied. I'll give my best to >> explain the problem, may be one of you have an idea how to implement it. I >> think, the best is to make a small example: lets assume we have 16 >> subdomains, uniformly subdividing a unit square. Each of the subdomain >> matrices is purely local, thus they have the communicator PETSC_COMM_SELF. >> Each of them is of size n x n. There is a coarse grid matrix, with >> communicator PETSC_COMM_WORLD and of size m x m. The coupling matrices >> between the global coarse grid and the local matrices are also global, so >> they are of size 16n x m and m x 16n, respectively. So far, everything is >> fine and works perfectly. Now I introduce four "local coarse grids", each of >> them couples four local subdomains, and is defined on a subset communicator >> of PETSC_COMM_WORLD. Say, each "local coarse grid" matrix is of size p x p, >> and there are also coupling matrices of size 4n x p and p x 4n. Now I have >> to perform a MatMatMult of the local coarse coupling matrices p x 4n with >> the global coupling matrix 16n x m. So the final matrix is of size 4p x m. >> But I cannot perform the MatMatMult, as the matrix sizes do not fit and the >> communicators are not compatible. >> >> Is it possible to understand, what I want to do? :) Any idea, how to >> implement it? > > It sounds like you need to redistribute the matrix before the > MatMatMult. I think you > can do this with MatGetSubmatrix(), if I understand your problem > correctly. You probably > need to move the matrix from the subcomm to the global comm first, > with empty entries > on some procs. I would just do it the naive way first, then profile to > see how it does. > > Matt > >> Thomas > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener > From gokhalen at gmail.com Thu Dec 13 11:50:42 2012 From: gokhalen at gmail.com (Nachiket Gokhale) Date: Thu, 13 Dec 2012 12:50:42 -0500 Subject: [petsc-users] MatCreateComposite question Message-ID: I am trying to create a composite matrix - the relevant snippet of the code is - Mat KK, AA[3]; ierr = MatDuplicate(KFullMat,MAT_COPY_VALUES,&AA[0]);CHKERRQ(ierr); ierr = MatDuplicate(CFullMat,MAT_COPY_VALUES,&AA[1]); CHKERRQ(ierr); ierr = MatDuplicate(MFullMat,MAT_COPY_VALUES,&AA[2]); CHKERRQ(ierr); ierr = MatScale(AA[1],iomega);CHKERRQ(ierr); ierr = MatScale(AA[2],-forcomega2); CHKERRQ(ierr); ierr = MatCreateComposite(PETSC_COMM_WORLD,3,AA,&KK); CHKERRQ(ierr); ierr = MatCompositeMerge(KK); CHKERRQ(ierr); This crashes with the error at the end of the message. Do you have any ideas about what might be causing this? Is there any other debugging output I should send - log_summary perhaps? -Nachiket [0]PETSC ERROR: --------------------- Error Message ------------------------------------ [0]PETSC ERROR: Object is in wrong state! [0]PETSC ERROR: Must call MatXXXSetPreallocation() or MatSetUp() on argument 1 "mat" before MatAssemblyBegin()! [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: Petsc Release Version 3.3.0, Patch 2, Fri Jul 13 15:42:00 CDT 2012 [0]PETSC ERROR: See docs/changes/index.html for recent updates. [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting. [0]PETSC ERROR: See docs/index.html for manual pages. [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: /home/gokhale/WAIGEN/GDEB-WAIGEN2012/bin/waigen on a linux-gcc named asd1.wai.com by gokhale Thu Dec 13 12:54:51 2012 [0]PETSC ERROR: Libraries linked from /opt/petsc/petsc-3.3-p2/linux-gcc-g++-mpich-mumps-complex-debug/lib [0]PETSC ERROR: Configure run at Mon Oct 29 18:41:24 2012 [2]PETSC ERROR: --------------------- Error Message ------------------------------------ [2]PETSC ERROR: Object is in wrong state! [2]PETSC ERROR: Must call MatXXXSetPreallocation() or MatSetUp() on argument 1 "mat" before MatAssemblyBegin()! [2]PETSC ERROR: ------------------------------------------------------------------------ [2]PETSC ERROR: Petsc Release Version 3.3.0, Patch 2, Fri Jul 13 15:42:00 CDT 2012 [2]PETSC ERROR: See docs/changes/index.html for recent updates. [2]PETSC ERROR: See docs/faq.html for hints about trouble shooting. [2]PETSC ERROR: See docs/index.html for manual pages. [2]PETSC ERROR: [3]PETSC ERROR: --------------------- Error Message ------------------------------------ [3]PETSC ERROR: Object is in wrong state! [3]PETSC ERROR: Must call MatXXXSetPreallocation() or MatSetUp() on argument 1 "mat" before MatAssemblyBegin()! [3]PETSC ERROR: ------------------------------------------------------------------------ [3]PETSC ERROR: Petsc Release Version 3.3.0, Patch 2, Fri Jul 13 15:42:00 CDT 2012 [3]PETSC ERROR: See docs/changes/index.html for recent updates. [3]PETSC ERROR: See docs/faq.html for hints about trouble shooting. [3]PETSC ERROR: See docs/index.html for manual pages. [3]PETSC ERROR: ------------------------------------------------------------------------ [3]PETSC ERROR: /home/gokhale/WAIGEN/GDEB-WAIGEN2012/bin/waigen on a linux-gcc named asd1.wai.com by gokhale Thu Dec 13 12:54:51 2012 [3]PETSC ERROR: [0]PETSC ERROR: Configure options --with-x=0 --with-mpi=1 --download-mpich=yes --with-x11=0 --with-debugging=1 --with-clanguage=C++ --with-shared-libraries=1 --download-mumps=yes --download-f-blas-lapack=1 --download-parmetis=1 --download-metis --download-scalapack=1 --download-blacs=1 --with-cmake=/usr/bin/cmake28 --with-scalar-type=complex [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: MatAssemblyBegin() line 4683 in /opt/petsc/petsc-3.3-p2/src/mat/interface/matrix.c [0]PETSC ERROR: MatCreateComposite() line 440 in /opt/petsc/petsc-3.3-p2/src/mat/impls/composite/mcomposite.c [0]PETSC ERROR: main() line 141 in src/examples/waigen.c ------------------------------------------------------------------------ [2]PETSC ERROR: /home/gokhale/WAIGEN/GDEB-WAIGEN2012/bin/waigen on a linux-gcc named asd1.wai.com by gokhale Thu Dec 13 12:54:51 2012 [2]PETSC ERROR: Libraries linked from /opt/petsc/petsc-3.3-p2/linux-gcc-g++-mpich-mumps-complex-debug/lib [2]PETSC ERROR: Configure run at Mon Oct 29 18:41:24 2012 [2]PETSC ERROR: Configure options --with-x=0 --with-mpi=1 --download-mpich=yes --with-x11=0 --with-debugging=1 --with-clanguage=C++ --with-shared-libraries=1 --download-mumps=yes --download-f-blas-lapack=1 --download-parmetis=1 --download-metis --download-scalapack=1 --download-blacs=1 --with-cmake=/usr/bin/cmake28 --with-scalar-type=complex [2]PETSC ERROR: ------------------------------------------------------------------------ Libraries linked from /opt/petsc/petsc-3.3-p2/linux-gcc-g++-mpich-mumps-complex-debug/lib [3]PETSC ERROR: Configure run at Mon Oct 29 18:41:24 2012 [3]PETSC ERROR: Configure options --with-x=0 --with-mpi=1 --download-mpich=yes --with-x11=0 --with-debugging=1 --with-clanguage=C++ --with-shared-libraries=1 --download-mumps=yes --download-f-blas-lapack=1 --download-parmetis=1 --download-metis --download-scalapack=1 --download-blacs=1 --with-cmake=/usr/bin/cmake28 --with-scalar-type=complex [3]PETSC ERROR: ------------------------------------------------------------------------ [3]PETSC ERROR: MatAssemblyBegin() line 4683 in /opt/petsc/petsc-3.3-p2/src/mat/interface/matrix.c [3]PETSC ERROR: application called MPI_Abort(MPI_COMM_WORLD, 73) - process 0 [cli_0]: aborting job: application called MPI_Abort(MPI_COMM_WORLD, 73) - process 0 [2]PETSC ERROR: MatAssemblyBegin() line 4683 in /opt/petsc/petsc-3.3-p2/src/mat/interface/matrix.c [2]PETSC ERROR: MatCreateComposite() line 440 in /opt/petsc/petsc-3.3-p2/src/mat/impls/composite/mcomposite.c MatCreateComposite() line 440 in /opt/petsc/petsc-3.3-p2/src/mat/impls/composite/mcomposite.c [3]PETSC ERROR: main() line 141 in src/examples/waigen.c [2]PETSC ERROR: main() line 141 in src/examples/waigen.c application called MPI_Abort(MPI_COMM_WORLD, 73) - process 3 application called MPI_Abort(MPI_COMM_WORLD, 73) - process 2 [cli_2]: aborting job: application called MPI_Abort(MPI_COMM_WORLD, 73) - process 2 [cli_3]: aborting job: application called MPI_Abort(MPI_COMM_WORLD, 73) - process 3 [1]PETSC ERROR: --------------------- Error Message ------------------------------------ [1]PETSC ERROR: Object is in wrong state! [1]PETSC ERROR: Must call MatXXXSetPreallocation() or MatSetUp() on argument 1 "mat" before MatAssemblyBegin()! [1]PETSC ERROR: ------------------------------------------------------------------------ [1]PETSC ERROR: Petsc Release Version 3.3.0, Patch 2, Fri Jul 13 15:42:00 CDT 2012 [1]PETSC ERROR: See docs/changes/index.html for recent updates. [1]PETSC ERROR: See docs/faq.html for hints about trouble shooting. [1]PETSC ERROR: See docs/index.html for manual pages. [1]PETSC ERROR: ------------------------------------------------------------------------ [1]PETSC ERROR: /home/gokhale/WAIGEN/GDEB-WAIGEN2012/bin/waigen on a linux-gcc named asd1.wai.com by gokhale Thu Dec 13 12:54:51 2012 [1]PETSC ERROR: Libraries linked from /opt/petsc/petsc-3.3-p2/linux-gcc-g++-mpich-mumps-complex-debug/lib [1]PETSC ERROR: Configure run at Mon Oct 29 18:41:24 2012 [1]PETSC ERROR: Configure options --with-x=0 --with-mpi=1 --download-mpich=yes --with-x11=0 --with-debugging=1 --with-clanguage=C++ --with-shared-libraries=1 --download-mumps=yes --download-f-blas-lapack=1 --download-parmetis=1 --download-metis --download-scalapack=1 --download-blacs=1 --with-cmake=/usr/bin/cmake28 --with-scalar-type=complex [1]PETSC ERROR: ------------------------------------------------------------------------ [1]PETSC ERROR: MatAssemblyBegin() line 4683 in /opt/petsc/petsc-3.3-p2/src/mat/interface/matrix.c [1]PETSC ERROR: MatCreateComposite() line 440 in /opt/petsc/petsc-3.3-p2/src/mat/impls/composite/mcomposite.c [1]PETSC ERROR: main() line 141 in src/examples/waigen.c application called MPI_Abort(MPI_COMM_WORLD, 73) - process 1 [cli_1]: aborting job: application called MPI_Abort(MPI_COMM_WORLD, 73) - process 1 -------------- next part -------------- An HTML attachment was scrubbed... URL: From gokhalen at gmail.com Thu Dec 13 12:47:35 2012 From: gokhalen at gmail.com (Nachiket Gokhale) Date: Thu, 13 Dec 2012 13:47:35 -0500 Subject: [petsc-users] MatCreateComposite question Message-ID: Sorry for replying to my own question, but it seems to be working in optimized mode, but not in Debug mode, which is strange. -Nachiket -------------- next part -------------- An HTML attachment was scrubbed... URL: From jedbrown at mcs.anl.gov Thu Dec 13 12:55:47 2012 From: jedbrown at mcs.anl.gov (Jed Brown) Date: Thu, 13 Dec 2012 10:55:47 -0800 Subject: [petsc-users] MatCreateComposite question In-Reply-To: References: Message-ID: That's because the check isn't firing in optimized mode. Barry, should MatCreate_Composite() set mat->preallocated = TRUE because preallocation is implicit for composite matrices, or should the user/MatCreateComposite() be responsible for calling MatSetUp()? On Thu, Dec 13, 2012 at 10:47 AM, Nachiket Gokhale wrote: > Sorry for replying to my own question, but it seems to be working in > optimized mode, but not in Debug mode, which is strange. -Nachiket -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Thu Dec 13 13:40:41 2012 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 13 Dec 2012 13:40:41 -0600 Subject: [petsc-users] MatCreateComposite question In-Reply-To: References: Message-ID: <909099E5-7A0D-491B-B2CE-8B9C2C1A6F8A@mcs.anl.gov> On Dec 13, 2012, at 12:55 PM, Jed Brown wrote: > That's because the check isn't firing in optimized mode. > > Barry, should MatCreate_Composite() set mat->preallocated = TRUE because preallocation is implicit for composite matrices, or should the user/MatCreateComposite() be responsible for calling MatSetUp()? I am fine with having it click to mat->preallocated automatically for now. Barry > > > On Thu, Dec 13, 2012 at 10:47 AM, Nachiket Gokhale wrote: > Sorry for replying to my own question, but it seems to be working in optimized mode, but not in Debug mode, which is strange. -Nachiket > From jedbrown at mcs.anl.gov Thu Dec 13 13:44:34 2012 From: jedbrown at mcs.anl.gov (Jed Brown) Date: Thu, 13 Dec 2012 11:44:34 -0800 Subject: [petsc-users] MatCreateComposite question In-Reply-To: <909099E5-7A0D-491B-B2CE-8B9C2C1A6F8A@mcs.anl.gov> References: <909099E5-7A0D-491B-B2CE-8B9C2C1A6F8A@mcs.anl.gov> Message-ID: https://bitbucket.org/petsc/petsc-3.3/commits/ceb522f2c6640c2934693f744f823595bb0438fc On Thu, Dec 13, 2012 at 11:40 AM, Barry Smith wrote: > > On Dec 13, 2012, at 12:55 PM, Jed Brown wrote: > > > That's because the check isn't firing in optimized mode. > > > > Barry, should MatCreate_Composite() set mat->preallocated = TRUE because > preallocation is implicit for composite matrices, or should the > user/MatCreateComposite() be responsible for calling MatSetUp()? > > I am fine with having it click to mat->preallocated automatically for > now. > > Barry > > > > > > > On Thu, Dec 13, 2012 at 10:47 AM, Nachiket Gokhale > wrote: > > Sorry for replying to my own question, but it seems to be working in > optimized mode, but not in Debug mode, which is strange. -Nachiket > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gokhalen at gmail.com Thu Dec 13 14:20:43 2012 From: gokhalen at gmail.com (Nachiket Gokhale) Date: Thu, 13 Dec 2012 15:20:43 -0500 Subject: [petsc-users] MatCreateComposite question In-Reply-To: References: <909099E5-7A0D-491B-B2CE-8B9C2C1A6F8A@mcs.anl.gov> Message-ID: Thanks, that seems to work. On Thu, Dec 13, 2012 at 2:44 PM, Jed Brown wrote: > > https://bitbucket.org/petsc/petsc-3.3/commits/ceb522f2c6640c2934693f744f823595bb0438fc > > > > On Thu, Dec 13, 2012 at 11:40 AM, Barry Smith wrote: > >> >> On Dec 13, 2012, at 12:55 PM, Jed Brown wrote: >> >> > That's because the check isn't firing in optimized mode. >> > >> > Barry, should MatCreate_Composite() set mat->preallocated = TRUE >> because preallocation is implicit for composite matrices, or should the >> user/MatCreateComposite() be responsible for calling MatSetUp()? >> >> I am fine with having it click to mat->preallocated automatically for >> now. >> >> Barry >> >> > >> > >> > On Thu, Dec 13, 2012 at 10:47 AM, Nachiket Gokhale >> wrote: >> > Sorry for replying to my own question, but it seems to be working in >> optimized mode, but not in Debug mode, which is strange. -Nachiket >> > >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gokhalen at gmail.com Thu Dec 13 15:20:37 2012 From: gokhalen at gmail.com (Nachiket Gokhale) Date: Thu, 13 Dec 2012 16:20:37 -0500 Subject: [petsc-users] MUMPS Stuck Message-ID: I am trying to solve a complex matrix equation which was assembled using MatCompositeMerge using MUMPS and LU preconditioner. It seems to me that the solve is stuck in the factorization phase. It is taking 20 mins or so, using 16 processes. A problem of the same size using reals instead of complex was solved previously in approximately a minute using 4 processes. Mumps output of *-mat_mumps_icntl_4 1 *at the end of this email. Does anyone have any ideas about what the problem maybe ? Thanks, -Nachiket * * * * Entering ZMUMPS driver with JOB, N, NZ = 1 122370 0 ZMUMPS 4.10.0 L U Solver for unsymmetric matrices Type of parallelism: Working host ****** ANALYSIS STEP ******** ** Max-trans not allowed because matrix is distributed ... Structural symmetry (in percent)= 100 Density: NBdense, Average, Median = 0 42 26 Ordering based on METIS A root of estimated size 2736 has been selected for Scalapack. Leaving analysis phase with ... INFOG(1) = 0 INFOG(2) = 0 -- (20) Number of entries in factors (estim.) = 563723522 -- (3) Storage of factors (REAL, estimated) = 565185337 -- (4) Storage of factors (INT , estimated) = 3537003 -- (5) Maximum frontal size (estimated) = 15239 -- (6) Number of nodes in the tree = 7914 -- (32) Type of analysis effectively used = 1 -- (7) Ordering option effectively used = 5 ICNTL(6) Maximum transversal option = 0 ICNTL(7) Pivot order option = 7 Percentage of memory relaxation (effective) = 35 Number of level 2 nodes = 35 Number of split nodes = 8 RINFOG(1) Operations during elimination (estim)= 4.877D+12 Distributed matrix entry format (ICNTL(18)) = 3 ** Rank of proc needing largest memory in IC facto : 0 ** Estimated corresponding MBYTES for IC facto : 3661 ** Estimated avg. MBYTES per work. proc at facto (IC) : 2018 ** TOTAL space in MBYTES for IC factorization : 32289 ** Rank of proc needing largest memory for OOC facto : 0 ** Estimated corresponding MBYTES for OOC facto : 3462 ** Estimated avg. MBYTES per work. proc at facto (OOC) : 1787 ** TOTAL space in MBYTES for OOC factorization : 28599 Entering ZMUMPS driver with JOB, N, NZ = 2 122370 5211070 ****** FACTORIZATION STEP ******** GLOBAL STATISTICS PRIOR NUMERICAL FACTORIZATION ... NUMBER OF WORKING PROCESSES = 16 OUT-OF-CORE OPTION (ICNTL(22)) = 0 REAL SPACE FOR FACTORS = 565185337 INTEGER SPACE FOR FACTORS = 3537003 MAXIMUM FRONTAL SIZE (ESTIMATED) = 15239 NUMBER OF NODES IN THE TREE = 7914 Convergence error after scaling for ONE-NORM (option 7/8) = 0.79D+00 Maximum effective relaxed size of S = 199523439 Average effective relaxed size of S = 98303057 REDISTRIB: TOTAL DATA LOCAL/SENT = 657185 14022665 GLOBAL TIME FOR MATRIX DISTRIBUTION = 0.4805 ** Memory relaxation parameter ( ICNTL(14) ) : 35 ** Rank of processor needing largest memory in facto : 0 ** Space in MBYTES used by this processor for facto : 3661 ** Avg. Space in MBYTES per working proc during facto : 2018 -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Dec 13 15:29:05 2012 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 13 Dec 2012 13:29:05 -0800 Subject: [petsc-users] MUMPS Stuck In-Reply-To: References: Message-ID: On Thu, Dec 13, 2012 at 1:20 PM, Nachiket Gokhale wrote: > I am trying to solve a complex matrix equation which was assembled using > MatCompositeMerge using MUMPS and LU preconditioner. It seems to me that > the solve is stuck in the factorization phase. It is taking 20 mins or so, > using 16 processes. A problem of the same size using reals instead of > complex was solved previously in approximately a minute using 4 processes. > Mumps output of -mat_mumps_icntl_4 1 at the end of this email. Does anyone > have any ideas about what the problem maybe ? Complex arithmetic is much more expensive, and you can lose some of the optimizations made in the code. I think you have to wait longer than this. Also, you should try attaching the debugger to a process to see whether it is computing or waiting. Matt > Thanks, > > -Nachiket > > > > Entering ZMUMPS driver with JOB, N, NZ = 1 122370 0 > > ZMUMPS 4.10.0 > L U Solver for unsymmetric matrices > Type of parallelism: Working host > > ****** ANALYSIS STEP ******** > > ** Max-trans not allowed because matrix is distributed > ... Structural symmetry (in percent)= 100 > Density: NBdense, Average, Median = 0 42 26 > Ordering based on METIS > A root of estimated size 2736 has been selected for Scalapack. > > Leaving analysis phase with ... > INFOG(1) = 0 > INFOG(2) = 0 > -- (20) Number of entries in factors (estim.) = 563723522 > -- (3) Storage of factors (REAL, estimated) = 565185337 > -- (4) Storage of factors (INT , estimated) = 3537003 > -- (5) Maximum frontal size (estimated) = 15239 > -- (6) Number of nodes in the tree = 7914 > -- (32) Type of analysis effectively used = 1 > -- (7) Ordering option effectively used = 5 > ICNTL(6) Maximum transversal option = 0 > ICNTL(7) Pivot order option = 7 > Percentage of memory relaxation (effective) = 35 > Number of level 2 nodes = 35 > Number of split nodes = 8 > RINFOG(1) Operations during elimination (estim)= 4.877D+12 > Distributed matrix entry format (ICNTL(18)) = 3 > ** Rank of proc needing largest memory in IC facto : 0 > ** Estimated corresponding MBYTES for IC facto : 3661 > ** Estimated avg. MBYTES per work. proc at facto (IC) : 2018 > ** TOTAL space in MBYTES for IC factorization : 32289 > ** Rank of proc needing largest memory for OOC facto : 0 > ** Estimated corresponding MBYTES for OOC facto : 3462 > ** Estimated avg. MBYTES per work. proc at facto (OOC) : 1787 > ** TOTAL space in MBYTES for OOC factorization : 28599 > Entering ZMUMPS driver with JOB, N, NZ = 2 122370 5211070 > > ****** FACTORIZATION STEP ******** > > > GLOBAL STATISTICS PRIOR NUMERICAL FACTORIZATION ... > NUMBER OF WORKING PROCESSES = 16 > OUT-OF-CORE OPTION (ICNTL(22)) = 0 > REAL SPACE FOR FACTORS = 565185337 > INTEGER SPACE FOR FACTORS = 3537003 > MAXIMUM FRONTAL SIZE (ESTIMATED) = 15239 > NUMBER OF NODES IN THE TREE = 7914 > Convergence error after scaling for ONE-NORM (option 7/8) = 0.79D+00 > Maximum effective relaxed size of S = 199523439 > Average effective relaxed size of S = 98303057 > > REDISTRIB: TOTAL DATA LOCAL/SENT = 657185 14022665 > GLOBAL TIME FOR MATRIX DISTRIBUTION = 0.4805 > ** Memory relaxation parameter ( ICNTL(14) ) : 35 > ** Rank of processor needing largest memory in facto : 0 > ** Space in MBYTES used by this processor for facto : 3661 > ** Avg. Space in MBYTES per working proc during facto : 2018 > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener From gokhalen at gmail.com Thu Dec 13 15:44:44 2012 From: gokhalen at gmail.com (Nachiket Gokhale) Date: Thu, 13 Dec 2012 16:44:44 -0500 Subject: [petsc-users] MUMPS Stuck In-Reply-To: References: Message-ID: Thanks - should I attached the debugger in debug mode or in optimized mode? I suspect it will be tremendously slow in debug mode, otoh I am not sure if it will yield any useful information in optimized mode. Also, will -on_error_attach_debugger do the trick? -Nachiket On Thu, Dec 13, 2012 at 4:29 PM, Matthew Knepley wrote: > On Thu, Dec 13, 2012 at 1:20 PM, Nachiket Gokhale > wrote: > > I am trying to solve a complex matrix equation which was assembled using > > MatCompositeMerge using MUMPS and LU preconditioner. It seems to me that > > the solve is stuck in the factorization phase. It is taking 20 mins or > so, > > using 16 processes. A problem of the same size using reals instead of > > complex was solved previously in approximately a minute using 4 > processes. > > Mumps output of -mat_mumps_icntl_4 1 at the end of this email. Does > anyone > > have any ideas about what the problem maybe ? > > Complex arithmetic is much more expensive, and you can lose some of > the optimizations > made in the code. I think you have to wait longer than this. Also, you > should try attaching > the debugger to a process to see whether it is computing or waiting. > > Matt > > > Thanks, > > > > -Nachiket > > > > > > > > Entering ZMUMPS driver with JOB, N, NZ = 1 122370 0 > > > > ZMUMPS 4.10.0 > > L U Solver for unsymmetric matrices > > Type of parallelism: Working host > > > > ****** ANALYSIS STEP ******** > > > > ** Max-trans not allowed because matrix is distributed > > ... Structural symmetry (in percent)= 100 > > Density: NBdense, Average, Median = 0 42 26 > > Ordering based on METIS > > A root of estimated size 2736 has been selected for Scalapack. > > > > Leaving analysis phase with ... > > INFOG(1) = 0 > > INFOG(2) = 0 > > -- (20) Number of entries in factors (estim.) = 563723522 > > -- (3) Storage of factors (REAL, estimated) = 565185337 > > -- (4) Storage of factors (INT , estimated) = 3537003 > > -- (5) Maximum frontal size (estimated) = 15239 > > -- (6) Number of nodes in the tree = 7914 > > -- (32) Type of analysis effectively used = 1 > > -- (7) Ordering option effectively used = 5 > > ICNTL(6) Maximum transversal option = 0 > > ICNTL(7) Pivot order option = 7 > > Percentage of memory relaxation (effective) = 35 > > Number of level 2 nodes = 35 > > Number of split nodes = 8 > > RINFOG(1) Operations during elimination (estim)= 4.877D+12 > > Distributed matrix entry format (ICNTL(18)) = 3 > > ** Rank of proc needing largest memory in IC facto : 0 > > ** Estimated corresponding MBYTES for IC facto : 3661 > > ** Estimated avg. MBYTES per work. proc at facto (IC) : 2018 > > ** TOTAL space in MBYTES for IC factorization : 32289 > > ** Rank of proc needing largest memory for OOC facto : 0 > > ** Estimated corresponding MBYTES for OOC facto : 3462 > > ** Estimated avg. MBYTES per work. proc at facto (OOC) : 1787 > > ** TOTAL space in MBYTES for OOC factorization : 28599 > > Entering ZMUMPS driver with JOB, N, NZ = 2 122370 5211070 > > > > ****** FACTORIZATION STEP ******** > > > > > > GLOBAL STATISTICS PRIOR NUMERICAL FACTORIZATION ... > > NUMBER OF WORKING PROCESSES = 16 > > OUT-OF-CORE OPTION (ICNTL(22)) = 0 > > REAL SPACE FOR FACTORS = 565185337 > > INTEGER SPACE FOR FACTORS = 3537003 > > MAXIMUM FRONTAL SIZE (ESTIMATED) = 15239 > > NUMBER OF NODES IN THE TREE = 7914 > > Convergence error after scaling for ONE-NORM (option 7/8) = 0.79D+00 > > Maximum effective relaxed size of S = 199523439 > > Average effective relaxed size of S = 98303057 > > > > REDISTRIB: TOTAL DATA LOCAL/SENT = 657185 14022665 > > GLOBAL TIME FOR MATRIX DISTRIBUTION = 0.4805 > > ** Memory relaxation parameter ( ICNTL(14) ) : 35 > > ** Rank of processor needing largest memory in facto : 0 > > ** Space in MBYTES used by this processor for facto : 3661 > > ** Avg. Space in MBYTES per working proc during facto : 2018 > > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Dec 13 16:19:32 2012 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 13 Dec 2012 14:19:32 -0800 Subject: [petsc-users] MUMPS Stuck In-Reply-To: References: Message-ID: On Thu, Dec 13, 2012 at 1:44 PM, Nachiket Gokhale wrote: > Thanks - should I attached the debugger in debug mode or in optimized mode? > I suspect it will be tremendously slow in debug mode, otoh I am not sure if > it will yield any useful information in optimized mode. Optimized will still give a stack trace. > Also, will -on_error_attach_debugger do the trick? No, either spawn one -start_in_debugger -debugger_nodes 0, or attach using gdb -p Matt > -Nachiket > > On Thu, Dec 13, 2012 at 4:29 PM, Matthew Knepley wrote: >> >> On Thu, Dec 13, 2012 at 1:20 PM, Nachiket Gokhale >> wrote: >> > I am trying to solve a complex matrix equation which was assembled using >> > MatCompositeMerge using MUMPS and LU preconditioner. It seems to me >> > that >> > the solve is stuck in the factorization phase. It is taking 20 mins or >> > so, >> > using 16 processes. A problem of the same size using reals instead of >> > complex was solved previously in approximately a minute using 4 >> > processes. >> > Mumps output of -mat_mumps_icntl_4 1 at the end of this email. Does >> > anyone >> > have any ideas about what the problem maybe ? >> >> Complex arithmetic is much more expensive, and you can lose some of >> the optimizations >> made in the code. I think you have to wait longer than this. Also, you >> should try attaching >> the debugger to a process to see whether it is computing or waiting. >> >> Matt >> >> > Thanks, >> > >> > -Nachiket >> > >> > >> > >> > Entering ZMUMPS driver with JOB, N, NZ = 1 122370 0 >> > >> > ZMUMPS 4.10.0 >> > L U Solver for unsymmetric matrices >> > Type of parallelism: Working host >> > >> > ****** ANALYSIS STEP ******** >> > >> > ** Max-trans not allowed because matrix is distributed >> > ... Structural symmetry (in percent)= 100 >> > Density: NBdense, Average, Median = 0 42 26 >> > Ordering based on METIS >> > A root of estimated size 2736 has been selected for Scalapack. >> > >> > Leaving analysis phase with ... >> > INFOG(1) = 0 >> > INFOG(2) = 0 >> > -- (20) Number of entries in factors (estim.) = 563723522 >> > -- (3) Storage of factors (REAL, estimated) = 565185337 >> > -- (4) Storage of factors (INT , estimated) = 3537003 >> > -- (5) Maximum frontal size (estimated) = 15239 >> > -- (6) Number of nodes in the tree = 7914 >> > -- (32) Type of analysis effectively used = 1 >> > -- (7) Ordering option effectively used = 5 >> > ICNTL(6) Maximum transversal option = 0 >> > ICNTL(7) Pivot order option = 7 >> > Percentage of memory relaxation (effective) = 35 >> > Number of level 2 nodes = 35 >> > Number of split nodes = 8 >> > RINFOG(1) Operations during elimination (estim)= 4.877D+12 >> > Distributed matrix entry format (ICNTL(18)) = 3 >> > ** Rank of proc needing largest memory in IC facto : 0 >> > ** Estimated corresponding MBYTES for IC facto : 3661 >> > ** Estimated avg. MBYTES per work. proc at facto (IC) : 2018 >> > ** TOTAL space in MBYTES for IC factorization : 32289 >> > ** Rank of proc needing largest memory for OOC facto : 0 >> > ** Estimated corresponding MBYTES for OOC facto : 3462 >> > ** Estimated avg. MBYTES per work. proc at facto (OOC) : 1787 >> > ** TOTAL space in MBYTES for OOC factorization : 28599 >> > Entering ZMUMPS driver with JOB, N, NZ = 2 122370 5211070 >> > >> > ****** FACTORIZATION STEP ******** >> > >> > >> > GLOBAL STATISTICS PRIOR NUMERICAL FACTORIZATION ... >> > NUMBER OF WORKING PROCESSES = 16 >> > OUT-OF-CORE OPTION (ICNTL(22)) = 0 >> > REAL SPACE FOR FACTORS = 565185337 >> > INTEGER SPACE FOR FACTORS = 3537003 >> > MAXIMUM FRONTAL SIZE (ESTIMATED) = 15239 >> > NUMBER OF NODES IN THE TREE = 7914 >> > Convergence error after scaling for ONE-NORM (option 7/8) = 0.79D+00 >> > Maximum effective relaxed size of S = 199523439 >> > Average effective relaxed size of S = 98303057 >> > >> > REDISTRIB: TOTAL DATA LOCAL/SENT = 657185 14022665 >> > GLOBAL TIME FOR MATRIX DISTRIBUTION = 0.4805 >> > ** Memory relaxation parameter ( ICNTL(14) ) : 35 >> > ** Rank of processor needing largest memory in facto : 0 >> > ** Space in MBYTES used by this processor for facto : 3661 >> > ** Avg. Space in MBYTES per working proc during facto : 2018 >> > >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which >> their experiments lead. >> -- Norbert Wiener > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener From gokhalen at gmail.com Thu Dec 13 17:03:13 2012 From: gokhalen at gmail.com (Nachiket Gokhale) Date: Thu, 13 Dec 2012 18:03:13 -0500 Subject: [petsc-users] MUMPS Stuck In-Reply-To: References: Message-ID: The factorizations seem to be going through. It seem to take 40 mins or so per factorization. -Nachiket On Thu, Dec 13, 2012 at 5:19 PM, Matthew Knepley wrote: > On Thu, Dec 13, 2012 at 1:44 PM, Nachiket Gokhale > wrote: > > Thanks - should I attached the debugger in debug mode or in optimized > mode? > > I suspect it will be tremendously slow in debug mode, otoh I am not sure > if > > it will yield any useful information in optimized mode. > > Optimized will still give a stack trace. > > > Also, will -on_error_attach_debugger do the trick? > > No, either spawn one -start_in_debugger -debugger_nodes 0, or attach > using gdb -p > > Matt > > > -Nachiket > > > > On Thu, Dec 13, 2012 at 4:29 PM, Matthew Knepley > wrote: > >> > >> On Thu, Dec 13, 2012 at 1:20 PM, Nachiket Gokhale > >> wrote: > >> > I am trying to solve a complex matrix equation which was assembled > using > >> > MatCompositeMerge using MUMPS and LU preconditioner. It seems to me > >> > that > >> > the solve is stuck in the factorization phase. It is taking 20 mins or > >> > so, > >> > using 16 processes. A problem of the same size using reals instead of > >> > complex was solved previously in approximately a minute using 4 > >> > processes. > >> > Mumps output of -mat_mumps_icntl_4 1 at the end of this email. Does > >> > anyone > >> > have any ideas about what the problem maybe ? > >> > >> Complex arithmetic is much more expensive, and you can lose some of > >> the optimizations > >> made in the code. I think you have to wait longer than this. Also, you > >> should try attaching > >> the debugger to a process to see whether it is computing or waiting. > >> > >> Matt > >> > >> > Thanks, > >> > > >> > -Nachiket > >> > > >> > > >> > > >> > Entering ZMUMPS driver with JOB, N, NZ = 1 122370 > 0 > >> > > >> > ZMUMPS 4.10.0 > >> > L U Solver for unsymmetric matrices > >> > Type of parallelism: Working host > >> > > >> > ****** ANALYSIS STEP ******** > >> > > >> > ** Max-trans not allowed because matrix is distributed > >> > ... Structural symmetry (in percent)= 100 > >> > Density: NBdense, Average, Median = 0 42 26 > >> > Ordering based on METIS > >> > A root of estimated size 2736 has been selected for > Scalapack. > >> > > >> > Leaving analysis phase with ... > >> > INFOG(1) = 0 > >> > INFOG(2) = 0 > >> > -- (20) Number of entries in factors (estim.) = 563723522 > >> > -- (3) Storage of factors (REAL, estimated) = 565185337 > >> > -- (4) Storage of factors (INT , estimated) = 3537003 > >> > -- (5) Maximum frontal size (estimated) = 15239 > >> > -- (6) Number of nodes in the tree = 7914 > >> > -- (32) Type of analysis effectively used = 1 > >> > -- (7) Ordering option effectively used = 5 > >> > ICNTL(6) Maximum transversal option = 0 > >> > ICNTL(7) Pivot order option = 7 > >> > Percentage of memory relaxation (effective) = 35 > >> > Number of level 2 nodes = 35 > >> > Number of split nodes = 8 > >> > RINFOG(1) Operations during elimination (estim)= 4.877D+12 > >> > Distributed matrix entry format (ICNTL(18)) = 3 > >> > ** Rank of proc needing largest memory in IC facto : 0 > >> > ** Estimated corresponding MBYTES for IC facto : 3661 > >> > ** Estimated avg. MBYTES per work. proc at facto (IC) : 2018 > >> > ** TOTAL space in MBYTES for IC factorization : 32289 > >> > ** Rank of proc needing largest memory for OOC facto : 0 > >> > ** Estimated corresponding MBYTES for OOC facto : 3462 > >> > ** Estimated avg. MBYTES per work. proc at facto (OOC) : 1787 > >> > ** TOTAL space in MBYTES for OOC factorization : 28599 > >> > Entering ZMUMPS driver with JOB, N, NZ = 2 122370 > 5211070 > >> > > >> > ****** FACTORIZATION STEP ******** > >> > > >> > > >> > GLOBAL STATISTICS PRIOR NUMERICAL FACTORIZATION ... > >> > NUMBER OF WORKING PROCESSES = 16 > >> > OUT-OF-CORE OPTION (ICNTL(22)) = 0 > >> > REAL SPACE FOR FACTORS = 565185337 > >> > INTEGER SPACE FOR FACTORS = 3537003 > >> > MAXIMUM FRONTAL SIZE (ESTIMATED) = 15239 > >> > NUMBER OF NODES IN THE TREE = 7914 > >> > Convergence error after scaling for ONE-NORM (option 7/8) = > 0.79D+00 > >> > Maximum effective relaxed size of S = 199523439 > >> > Average effective relaxed size of S = 98303057 > >> > > >> > REDISTRIB: TOTAL DATA LOCAL/SENT = 657185 14022665 > >> > GLOBAL TIME FOR MATRIX DISTRIBUTION = 0.4805 > >> > ** Memory relaxation parameter ( ICNTL(14) ) : 35 > >> > ** Rank of processor needing largest memory in facto : 0 > >> > ** Space in MBYTES used by this processor for facto : 3661 > >> > ** Avg. Space in MBYTES per working proc during facto : 2018 > >> > > >> > >> > >> > >> -- > >> What most experimenters take for granted before they begin their > >> experiments is infinitely more interesting than any results to which > >> their experiments lead. > >> -- Norbert Wiener > > > > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: From malexe at vt.edu Fri Dec 14 04:59:56 2012 From: malexe at vt.edu (Mihai Alexe) Date: Fri, 14 Dec 2012 11:59:56 +0100 Subject: [petsc-users] Is MatSetUp required with MatCreateNormal and MatCreateMPIAIJWithSplitArrays? In-Reply-To: <269A4421-4AB1-4829-A578-5B66CEF5C6C9@mcs.anl.gov> References: <269A4421-4AB1-4829-A578-5B66CEF5C6C9@mcs.anl.gov> Message-ID: Barry, I've tracked down the problem. I ran with -info -mat_view_info, and fpe's enabled and got a SIGFPE after entering MatCreateMPIAIJWithSplitArrays (Petsc did not produce a stacktrace unfortunately). This was due to a floating point exception in a typecast inside mat/interface/matrix.c: if (mat->ops->getinfo) { MatInfo info; ierr = MatGetInfo(mat,MAT_GLOBAL_SUM,&info);CHKERRQ(ierr); ierr = PetscViewerASCIIPrintf(viewer,"*total: nonzeros=%D*, allocated nonzeros=%D\n",*(PetscInt)info.nz_used* ,(PetscInt)info.nz_allocated);CHKERRQ(ierr); ierr = PetscViewerASCIIPrintf(viewer,"total number of mallocs used during MatSetValues calls =%D\n",(PetscInt)info.mallocs);CHKERRQ(ierr); } My sparse matrix has about 6 billion nonzeros. When I disable FPEs, i get a silent overflow when converting MatInfo.nz_used from PetscLogDouble to (32-bit) PetscInt: Matrix Object: 96 MPI processes type: mpiaij rows=131857963, cols=18752388 total: *nonzeros=-2147483648*, allocated nonzeros=0 and the code runs just fine. Maybe PETSc should cast nz_used to a long int? Mihai On Thu, Nov 29, 2012 at 6:25 PM, Barry Smith wrote: > > On Nov 29, 2012, at 9:48 AM, Mihai Alexe wrote: > > > Hello all, > > > > I am creating a large rectangular MPIAIJ matrix, then a shell > NormalMatrix that eventually gets passed to a KSP object (all part of a > constrained least-squares solver). > > Code looks as follows: > > > > //user.A_mat and user.Hess are PETSc Mat > > > > info = MatCreateMPIAIJWithSplitArrays( PETSC_COMM_WORLD, *locrow, > *loccol, nrow, > > *ncol, onrowidx, oncolidx, > > (PetscScalar*) onvals, offrowidx, > offcolidx, > > (PetscScalar*) values, &user.A_mat ); > CHKERRQ(info); > > > > info = MatCreateNormal( user.A_mat, &user.Hess ); CHKERRQ(info); > > info = MatSetUp( user.Hess ); > > > > Is MatSetUp() required for A or Hess to be initialized correctly? Or > some call to MatSetPreallocation? > ' > No you shouldn't need them. Try with valgrind > http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > > Barry > > > > > My code crashes after displaying (with -info -mat_view_info): > > > > [0] PetscCommDuplicate(): Duplicating a communicator 47534399113024 > 67425648 max tags = 2147483647 > > [0] PetscCommDuplicate(): Duplicating a communicator 47534399112000 > 67760592 max tags = 2147483647 > > [0] MatCreate_SeqAIJ_Inode(): Not using Inode routines due to > -mat_no_inode > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 8920860 X 1508490; storage > space: 0 unneeded,34572269 used > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 615 > > Matrix Object: 1 MPI processes > > type: seqaij > > rows=8920860, cols=1508490 > > total: nonzeros=34572269, allocated nonzeros=0 > > total number of mallocs used during MatSetValues calls =0 > > not using I-node routines > > [0] PetscCommDuplicate(): Using internal PETSc communicator > 47534399112000 67760592 > > [0] MatCreate_SeqAIJ_Inode(): Not using Inode routines due to > -mat_no_inode > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 8920860 X 18752388; storage > space: 0 unneeded,1762711 used > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 349 > > Matrix Object: 1 MPI processes > > type: seqaij > > rows=8920860, cols=18752388 > > total: nonzeros=1762711, allocated nonzeros=0 > > total number of mallocs used during MatSetValues calls =0 > > not using I-node routines > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 8920860 X 1508490; storage > space: 0 unneeded,34572269 used > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 615 > > Matrix Object: 1 MPI processes > > type: seqaij > > rows=8920860, cols=1508490 > > total: nonzeros=34572269, allocated nonzeros=0 > > total number of mallocs used during MatSetValues calls =0 > > not using I-node routines > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 8920860 X 18752388; storage > space: 0 unneeded,1762711 used > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 349 > > Matrix Object: 1 MPI processes > > type: seqaij > > rows=8920860, cols=18752388 > > total: nonzeros=1762711, allocated nonzeros=0 > > total number of mallocs used during MatSetValues calls =0 > > not using I-node routines > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 8920860 X 1508490; storage > space: 0 unneeded,34572269 used > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 615 > > [0] PetscCommDuplicate(): Using internal PETSc communicator > 47534399112000 67760592 > > [0] PetscCommDuplicate(): Using internal PETSc communicator > 47534399112000 67760592 > > [0] VecScatterCreateCommon_PtoS(): Using blocksize 1 scatter > > [0] VecScatterCreate(): General case: MPI to Seq > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 8920860 X 38109; storage > space: 0 unneeded,1762711 used > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 349 > > Matrix Object: 160 MPI processes > > type: mpiaij > > rows=131858910, cols=18752388 > > > > The code ran just fine on a smaller (pruned) input dataset. > > I don't get a stacktrace unfortunately... (running in production mode, > trying to switch to debug mode now). > > > > > > Regards, > > Mihai > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Fri Dec 14 07:28:34 2012 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 14 Dec 2012 07:28:34 -0600 Subject: [petsc-users] Is MatSetUp required with MatCreateNormal and MatCreateMPIAIJWithSplitArrays? In-Reply-To: References: <269A4421-4AB1-4829-A578-5B66CEF5C6C9@mcs.anl.gov> Message-ID: <3647C5B0-A0C4-4BB9-8017-371BA2AB8B68@mcs.anl.gov> Mihai, Thanks for tracking down the problem. As a side note, you are getting close to using all of the space in int in your matrix row/column sizes, when you matrix sizes are great than 2^{31}-1 you will need to configure PETSc with --with-64-bit-indices to have PETSc use long long int for PetscInt. Satish, Could you please patch 3.3 and replace the use of %D with %lld and replace the (PetscInt) casts with (long long int) casts in the two lines ierr = PetscViewerASCIIPrintf(viewer,"total: nonzeros=%D, allocated nonzeros=%D\n",(PetscInt)info.nz_used,(PetscInt)info.nz_allocated);CHKERRQ(ierr); ierr = PetscViewerASCIIPrintf(viewer,"total number of mallocs used during MatSetValues calls =%D\n",(PetscInt)info.mallocs);CHKERRQ(ierr); Thanks Barry On Dec 14, 2012, at 4:59 AM, Mihai Alexe wrote: > Barry, > > I've tracked down the problem. > > I ran with -info -mat_view_info, and fpe's enabled and got a SIGFPE after entering MatCreateMPIAIJWithSplitArrays (Petsc did not produce a stacktrace unfortunately). This was due to a floating point exception in a typecast inside mat/interface/matrix.c: > > if (mat->ops->getinfo) { > MatInfo info; > ierr = MatGetInfo(mat,MAT_GLOBAL_SUM,&info);CHKERRQ(ierr); > ierr = PetscViewerASCIIPrintf(viewer,"total: nonzeros=%D, allocated nonzeros=%D\n",(PetscInt)info.nz_used,(PetscInt)info.nz_allocated);CHKERRQ(ierr); > ierr = PetscViewerASCIIPrintf(viewer,"total number of mallocs used during MatSetValues calls =%D\n",(PetscInt)info.mallocs);CHKERRQ(ierr); > } > > My sparse matrix has about 6 billion nonzeros. When I disable FPEs, i get a silent overflow when converting MatInfo.nz_used from PetscLogDouble to (32-bit) PetscInt: > > Matrix Object: 96 MPI processes > type: mpiaij > rows=131857963, cols=18752388 > total: nonzeros=-2147483648, allocated nonzeros=0 > > and the code runs just fine. Maybe PETSc should cast nz_used to a long int? > > > Mihai > On Thu, Nov 29, 2012 at 6:25 PM, Barry Smith wrote: > > On Nov 29, 2012, at 9:48 AM, Mihai Alexe wrote: > > > Hello all, > > > > I am creating a large rectangular MPIAIJ matrix, then a shell NormalMatrix that eventually gets passed to a KSP object (all part of a constrained least-squares solver). > > Code looks as follows: > > > > //user.A_mat and user.Hess are PETSc Mat > > > > info = MatCreateMPIAIJWithSplitArrays( PETSC_COMM_WORLD, *locrow, *loccol, nrow, > > *ncol, onrowidx, oncolidx, > > (PetscScalar*) onvals, offrowidx, offcolidx, > > (PetscScalar*) values, &user.A_mat ); CHKERRQ(info); > > > > info = MatCreateNormal( user.A_mat, &user.Hess ); CHKERRQ(info); > > info = MatSetUp( user.Hess ); > > > > Is MatSetUp() required for A or Hess to be initialized correctly? Or some call to MatSetPreallocation? > ' > No you shouldn't need them. Try with valgrind http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > > Barry > > > > > My code crashes after displaying (with -info -mat_view_info): > > > > [0] PetscCommDuplicate(): Duplicating a communicator 47534399113024 67425648 max tags = 2147483647 > > [0] PetscCommDuplicate(): Duplicating a communicator 47534399112000 67760592 max tags = 2147483647 > > [0] MatCreate_SeqAIJ_Inode(): Not using Inode routines due to -mat_no_inode > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 8920860 X 1508490; storage space: 0 unneeded,34572269 used > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 615 > > Matrix Object: 1 MPI processes > > type: seqaij > > rows=8920860, cols=1508490 > > total: nonzeros=34572269, allocated nonzeros=0 > > total number of mallocs used during MatSetValues calls =0 > > not using I-node routines > > [0] PetscCommDuplicate(): Using internal PETSc communicator 47534399112000 67760592 > > [0] MatCreate_SeqAIJ_Inode(): Not using Inode routines due to -mat_no_inode > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 8920860 X 18752388; storage space: 0 unneeded,1762711 used > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 349 > > Matrix Object: 1 MPI processes > > type: seqaij > > rows=8920860, cols=18752388 > > total: nonzeros=1762711, allocated nonzeros=0 > > total number of mallocs used during MatSetValues calls =0 > > not using I-node routines > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 8920860 X 1508490; storage space: 0 unneeded,34572269 used > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 615 > > Matrix Object: 1 MPI processes > > type: seqaij > > rows=8920860, cols=1508490 > > total: nonzeros=34572269, allocated nonzeros=0 > > total number of mallocs used during MatSetValues calls =0 > > not using I-node routines > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 8920860 X 18752388; storage space: 0 unneeded,1762711 used > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 349 > > Matrix Object: 1 MPI processes > > type: seqaij > > rows=8920860, cols=18752388 > > total: nonzeros=1762711, allocated nonzeros=0 > > total number of mallocs used during MatSetValues calls =0 > > not using I-node routines > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 8920860 X 1508490; storage space: 0 unneeded,34572269 used > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 615 > > [0] PetscCommDuplicate(): Using internal PETSc communicator 47534399112000 67760592 > > [0] PetscCommDuplicate(): Using internal PETSc communicator 47534399112000 67760592 > > [0] VecScatterCreateCommon_PtoS(): Using blocksize 1 scatter > > [0] VecScatterCreate(): General case: MPI to Seq > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 8920860 X 38109; storage space: 0 unneeded,1762711 used > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 349 > > Matrix Object: 160 MPI processes > > type: mpiaij > > rows=131858910, cols=18752388 > > > > The code ran just fine on a smaller (pruned) input dataset. > > I don't get a stacktrace unfortunately... (running in production mode, trying to switch to debug mode now). > > > > > > Regards, > > Mihai > > > > From malexe at vt.edu Fri Dec 14 07:39:35 2012 From: malexe at vt.edu (Mihai Alexe) Date: Fri, 14 Dec 2012 14:39:35 +0100 Subject: [petsc-users] Is MatSetUp required with MatCreateNormal and MatCreateMPIAIJWithSplitArrays? In-Reply-To: <3647C5B0-A0C4-4BB9-8017-371BA2AB8B68@mcs.anl.gov> References: <269A4421-4AB1-4829-A578-5B66CEF5C6C9@mcs.anl.gov> <3647C5B0-A0C4-4BB9-8017-371BA2AB8B68@mcs.anl.gov> Message-ID: Barry, Indeed. As a side remark, the number of unknowns for my least-squares problem is well within the maximum 32-bit integer limit. That's why I did not immediately think that 32-bit ints may cause a problem. It's only the matrix nonzero count that goes over that bound. Quick overview of my "A": mglb (rows) =131857963, nglb (cols) =18752388, nnz_glb (nonzeros) = 5812947924 Going to 64-bit integers is not really an option. Long story short, I am working in single-precision mode, and the PETSc code is called from a Fortran kernel where we have imposed an EQUIVALENCE between single precision floats and ints (legacy design...) Best, Mihai On Fri, Dec 14, 2012 at 2:28 PM, Barry Smith wrote: > > Mihai, > > Thanks for tracking down the problem. As a side note, you are getting > close to using all of the space in int in your matrix row/column sizes, > when you matrix sizes are great than 2^{31}-1 you will need to configure > PETSc with --with-64-bit-indices to have PETSc use long long int for > PetscInt. > > Satish, > > Could you please patch 3.3 and replace the use of %D with %lld and > replace the (PetscInt) casts with (long long int) casts in the two lines > ierr = PetscViewerASCIIPrintf(viewer,"total: nonzeros=%D, allocated > nonzeros=%D\n",(PetscInt)info.nz_used,(PetscInt)info.nz_allocated);CHKERRQ(ierr); > ierr = PetscViewerASCIIPrintf(viewer,"total number of mallocs used > during MatSetValues calls =%D\n",(PetscInt)info.mallocs);CHKERRQ(ierr); > > Thanks > > Barry > > On Dec 14, 2012, at 4:59 AM, Mihai Alexe wrote: > > > Barry, > > > > I've tracked down the problem. > > > > I ran with -info -mat_view_info, and fpe's enabled and got a SIGFPE > after entering MatCreateMPIAIJWithSplitArrays (Petsc did not produce a > stacktrace unfortunately). This was due to a floating point exception in a > typecast inside mat/interface/matrix.c: > > > > if (mat->ops->getinfo) { > > MatInfo info; > > ierr = MatGetInfo(mat,MAT_GLOBAL_SUM,&info);CHKERRQ(ierr); > > ierr = PetscViewerASCIIPrintf(viewer,"total: nonzeros=%D, > allocated > nonzeros=%D\n",(PetscInt)info.nz_used,(PetscInt)info.nz_allocated);CHKERRQ(ierr); > > ierr = PetscViewerASCIIPrintf(viewer,"total number of mallocs > used during MatSetValues calls =%D\n",(PetscInt)info.mallocs);CHKERRQ(ierr); > > } > > > > My sparse matrix has about 6 billion nonzeros. When I disable FPEs, i > get a silent overflow when converting MatInfo.nz_used from PetscLogDouble > to (32-bit) PetscInt: > > > > Matrix Object: 96 MPI processes > > type: mpiaij > > rows=131857963, cols=18752388 > > total: nonzeros=-2147483648, allocated nonzeros=0 > > > > and the code runs just fine. Maybe PETSc should cast nz_used to a long > int? > > > > > > Mihai > > On Thu, Nov 29, 2012 at 6:25 PM, Barry Smith wrote: > > > > On Nov 29, 2012, at 9:48 AM, Mihai Alexe wrote: > > > > > Hello all, > > > > > > I am creating a large rectangular MPIAIJ matrix, then a shell > NormalMatrix that eventually gets passed to a KSP object (all part of a > constrained least-squares solver). > > > Code looks as follows: > > > > > > //user.A_mat and user.Hess are PETSc Mat > > > > > > info = MatCreateMPIAIJWithSplitArrays( PETSC_COMM_WORLD, *locrow, > *loccol, nrow, > > > *ncol, onrowidx, oncolidx, > > > (PetscScalar*) onvals, offrowidx, > offcolidx, > > > (PetscScalar*) values, &user.A_mat ); > CHKERRQ(info); > > > > > > info = MatCreateNormal( user.A_mat, &user.Hess ); CHKERRQ(info); > > > info = MatSetUp( user.Hess ); > > > > > > Is MatSetUp() required for A or Hess to be initialized correctly? Or > some call to MatSetPreallocation? > > ' > > No you shouldn't need them. Try with valgrind > http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > > > > Barry > > > > > > > > My code crashes after displaying (with -info -mat_view_info): > > > > > > [0] PetscCommDuplicate(): Duplicating a communicator 47534399113024 > 67425648 max tags = 2147483647 > > > [0] PetscCommDuplicate(): Duplicating a communicator 47534399112000 > 67760592 max tags = 2147483647 > > > [0] MatCreate_SeqAIJ_Inode(): Not using Inode routines due to > -mat_no_inode > > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 8920860 X 1508490; storage > space: 0 unneeded,34572269 used > > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() > is 0 > > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 615 > > > Matrix Object: 1 MPI processes > > > type: seqaij > > > rows=8920860, cols=1508490 > > > total: nonzeros=34572269, allocated nonzeros=0 > > > total number of mallocs used during MatSetValues calls =0 > > > not using I-node routines > > > [0] PetscCommDuplicate(): Using internal PETSc communicator > 47534399112000 67760592 > > > [0] MatCreate_SeqAIJ_Inode(): Not using Inode routines due to > -mat_no_inode > > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 8920860 X 18752388; storage > space: 0 unneeded,1762711 used > > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() > is 0 > > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 349 > > > Matrix Object: 1 MPI processes > > > type: seqaij > > > rows=8920860, cols=18752388 > > > total: nonzeros=1762711, allocated nonzeros=0 > > > total number of mallocs used during MatSetValues calls =0 > > > not using I-node routines > > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 8920860 X 1508490; storage > space: 0 unneeded,34572269 used > > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() > is 0 > > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 615 > > > Matrix Object: 1 MPI processes > > > type: seqaij > > > rows=8920860, cols=1508490 > > > total: nonzeros=34572269, allocated nonzeros=0 > > > total number of mallocs used during MatSetValues calls =0 > > > not using I-node routines > > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 8920860 X 18752388; storage > space: 0 unneeded,1762711 used > > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() > is 0 > > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 349 > > > Matrix Object: 1 MPI processes > > > type: seqaij > > > rows=8920860, cols=18752388 > > > total: nonzeros=1762711, allocated nonzeros=0 > > > total number of mallocs used during MatSetValues calls =0 > > > not using I-node routines > > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 8920860 X 1508490; storage > space: 0 unneeded,34572269 used > > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() > is 0 > > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 615 > > > [0] PetscCommDuplicate(): Using internal PETSc communicator > 47534399112000 67760592 > > > [0] PetscCommDuplicate(): Using internal PETSc communicator > 47534399112000 67760592 > > > [0] VecScatterCreateCommon_PtoS(): Using blocksize 1 scatter > > > [0] VecScatterCreate(): General case: MPI to Seq > > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 8920860 X 38109; storage > space: 0 unneeded,1762711 used > > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() > is 0 > > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 349 > > > Matrix Object: 160 MPI processes > > > type: mpiaij > > > rows=131858910, cols=18752388 > > > > > > The code ran just fine on a smaller (pruned) input dataset. > > > I don't get a stacktrace unfortunately... (running in production mode, > trying to switch to debug mode now). > > > > > > > > > Regards, > > > Mihai > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Fri Dec 14 10:26:03 2012 From: balay at mcs.anl.gov (Satish Balay) Date: Fri, 14 Dec 2012 10:26:03 -0600 (CST) Subject: [petsc-users] Is MatSetUp required with MatCreateNormal and MatCreateMPIAIJWithSplitArrays? In-Reply-To: <3647C5B0-A0C4-4BB9-8017-371BA2AB8B68@mcs.anl.gov> References: <269A4421-4AB1-4829-A578-5B66CEF5C6C9@mcs.anl.gov> <3647C5B0-A0C4-4BB9-8017-371BA2AB8B68@mcs.anl.gov> Message-ID: pushed https://bitbucket.org/petsc/petsc-3.3/commits/6dac937a3eace3b81d6dcbb945ee7a85 Satish On Fri, 14 Dec 2012, Barry Smith wrote: > > Mihai, > > Thanks for tracking down the problem. As a side note, you are getting close to using all of the space in int in your matrix row/column sizes, when you matrix sizes are great than 2^{31}-1 you will need to configure PETSc with --with-64-bit-indices to have PETSc use long long int for PetscInt. > > Satish, > > Could you please patch 3.3 and replace the use of %D with %lld and replace the (PetscInt) casts with (long long int) casts in the two lines > ierr = PetscViewerASCIIPrintf(viewer,"total: nonzeros=%D, allocated nonzeros=%D\n",(PetscInt)info.nz_used,(PetscInt)info.nz_allocated);CHKERRQ(ierr); > ierr = PetscViewerASCIIPrintf(viewer,"total number of mallocs used during MatSetValues calls =%D\n",(PetscInt)info.mallocs);CHKERRQ(ierr); > > Thanks > > Barry > > On Dec 14, 2012, at 4:59 AM, Mihai Alexe wrote: > > > Barry, > > > > I've tracked down the problem. > > > > I ran with -info -mat_view_info, and fpe's enabled and got a SIGFPE after entering MatCreateMPIAIJWithSplitArrays (Petsc did not produce a stacktrace unfortunately). This was due to a floating point exception in a typecast inside mat/interface/matrix.c: > > > > if (mat->ops->getinfo) { > > MatInfo info; > > ierr = MatGetInfo(mat,MAT_GLOBAL_SUM,&info);CHKERRQ(ierr); > > ierr = PetscViewerASCIIPrintf(viewer,"total: nonzeros=%D, allocated nonzeros=%D\n",(PetscInt)info.nz_used,(PetscInt)info.nz_allocated);CHKERRQ(ierr); > > ierr = PetscViewerASCIIPrintf(viewer,"total number of mallocs used during MatSetValues calls =%D\n",(PetscInt)info.mallocs);CHKERRQ(ierr); > > } > > > > My sparse matrix has about 6 billion nonzeros. When I disable FPEs, i get a silent overflow when converting MatInfo.nz_used from PetscLogDouble to (32-bit) PetscInt: > > > > Matrix Object: 96 MPI processes > > type: mpiaij > > rows=131857963, cols=18752388 > > total: nonzeros=-2147483648, allocated nonzeros=0 > > > > and the code runs just fine. Maybe PETSc should cast nz_used to a long int? > > > > > > Mihai > > On Thu, Nov 29, 2012 at 6:25 PM, Barry Smith wrote: > > > > On Nov 29, 2012, at 9:48 AM, Mihai Alexe wrote: > > > > > Hello all, > > > > > > I am creating a large rectangular MPIAIJ matrix, then a shell NormalMatrix that eventually gets passed to a KSP object (all part of a constrained least-squares solver). > > > Code looks as follows: > > > > > > //user.A_mat and user.Hess are PETSc Mat > > > > > > info = MatCreateMPIAIJWithSplitArrays( PETSC_COMM_WORLD, *locrow, *loccol, nrow, > > > *ncol, onrowidx, oncolidx, > > > (PetscScalar*) onvals, offrowidx, offcolidx, > > > (PetscScalar*) values, &user.A_mat ); CHKERRQ(info); > > > > > > info = MatCreateNormal( user.A_mat, &user.Hess ); CHKERRQ(info); > > > info = MatSetUp( user.Hess ); > > > > > > Is MatSetUp() required for A or Hess to be initialized correctly? Or some call to MatSetPreallocation? > > ' > > No you shouldn't need them. Try with valgrind http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > > > > Barry > > > > > > > > My code crashes after displaying (with -info -mat_view_info): > > > > > > [0] PetscCommDuplicate(): Duplicating a communicator 47534399113024 67425648 max tags = 2147483647 > > > [0] PetscCommDuplicate(): Duplicating a communicator 47534399112000 67760592 max tags = 2147483647 > > > [0] MatCreate_SeqAIJ_Inode(): Not using Inode routines due to -mat_no_inode > > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 8920860 X 1508490; storage space: 0 unneeded,34572269 used > > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 615 > > > Matrix Object: 1 MPI processes > > > type: seqaij > > > rows=8920860, cols=1508490 > > > total: nonzeros=34572269, allocated nonzeros=0 > > > total number of mallocs used during MatSetValues calls =0 > > > not using I-node routines > > > [0] PetscCommDuplicate(): Using internal PETSc communicator 47534399112000 67760592 > > > [0] MatCreate_SeqAIJ_Inode(): Not using Inode routines due to -mat_no_inode > > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 8920860 X 18752388; storage space: 0 unneeded,1762711 used > > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 349 > > > Matrix Object: 1 MPI processes > > > type: seqaij > > > rows=8920860, cols=18752388 > > > total: nonzeros=1762711, allocated nonzeros=0 > > > total number of mallocs used during MatSetValues calls =0 > > > not using I-node routines > > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 8920860 X 1508490; storage space: 0 unneeded,34572269 used > > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 615 > > > Matrix Object: 1 MPI processes > > > type: seqaij > > > rows=8920860, cols=1508490 > > > total: nonzeros=34572269, allocated nonzeros=0 > > > total number of mallocs used during MatSetValues calls =0 > > > not using I-node routines > > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 8920860 X 18752388; storage space: 0 unneeded,1762711 used > > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 349 > > > Matrix Object: 1 MPI processes > > > type: seqaij > > > rows=8920860, cols=18752388 > > > total: nonzeros=1762711, allocated nonzeros=0 > > > total number of mallocs used during MatSetValues calls =0 > > > not using I-node routines > > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 8920860 X 1508490; storage space: 0 unneeded,34572269 used > > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 615 > > > [0] PetscCommDuplicate(): Using internal PETSc communicator 47534399112000 67760592 > > > [0] PetscCommDuplicate(): Using internal PETSc communicator 47534399112000 67760592 > > > [0] VecScatterCreateCommon_PtoS(): Using blocksize 1 scatter > > > [0] VecScatterCreate(): General case: MPI to Seq > > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 8920860 X 38109; storage space: 0 unneeded,1762711 used > > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 349 > > > Matrix Object: 160 MPI processes > > > type: mpiaij > > > rows=131858910, cols=18752388 > > > > > > The code ran just fine on a smaller (pruned) input dataset. > > > I don't get a stacktrace unfortunately... (running in production mode, trying to switch to debug mode now). > > > > > > > > > Regards, > > > Mihai > > > > > > > > > From gokhalen at gmail.com Mon Dec 17 15:54:44 2012 From: gokhalen at gmail.com (Nachiket Gokhale) Date: Mon, 17 Dec 2012 16:54:44 -0500 Subject: [petsc-users] MatMatMult Question Message-ID: I am trying to multiply the transpose of a matrix with another matrix using matmatmult. The transpose operator is created using MatTranspose. I get the error at the end of the email. Is this an error saying that cols of matrix 1 and not equal to the rows of matrix 2? I checked and the rows and columns seem to allow matrix multiplication: Left = (54,1760), Right=(1760,54). The snipped of the code which produces this is ierr = MatMatMult(*KFullMat,ProjR,MAT_INITIAL_MATRIX,PETSC_DEFAULT,&TempMat);CHKERRQ(ierr); ierr = MatGetSize(ProjLT,&nrowl,&ncoll); CHKERRQ(ierr); ierr = MatGetSize(TempMat,&nrowr,&ncolr); CHKERRQ(ierr); ierr = PetscPrintf(PETSC_COMM_WORLD,"Left = (%d,%d), Right=(%d,%d)\n",nrowl,ncoll,nrowr,ncolr); CHKERRQ(ierr); ierr = MatMatMult(ProjLT,TempMat,MAT_INITIAL_MATRIX,PETSC_DEFAULT,&KProj); CHKERRQ(ierr); ierr = MatDestroy(&TempMat); Thanks, -Nachiket [0]PETSC ERROR: --------------------- Error Message ------------------------------------ [0]PETSC ERROR: Arguments are incompatible! [0]PETSC ERROR: MatMatMult requires A, seqdense, to be compatible with B, seqdense! [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: Petsc Release Version 3.3.0, Patch 2, Fri Jul 13 15:42:00 CDT 2012 [0]PETSC ERROR: See docs/changes/index.html for recent updates. [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting. [0]PETSC ERROR: See docs/index.html for manual pages. [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: /home/gokhale/WAIGEN/GDEB-WAIGEN2012/bin/waigen on a linux-gcc named asd1.wai.com by gokhale Mon Dec 17 16:58:13 2012 [0]PETSC ERROR: Libraries linked from /opt/petsc/petsc-3.3-p2/linux-gcc-g++-mpich-mumps-complex-debug/lib [0]PETSC ERROR: Configure run at Mon Oct 29 18:41:24 2012 [0]PETSC ERROR: Configure options --with-x=0 --with-mpi=1 --download-mpich=yes --with-x11=0 --with-debugging=1 --with-clanguage=C++ --with-shared-libraries=1 --download-mumps=yes --download-f-blas-lapack=1 --download-parmetis=1 --download-metis --download-scalapack=1 --download-blacs=1 --with-cmake=/usr/bin/cmake28 --with-scalar-type=complex [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: MatMatMult() line 8601 in /opt/petsc/petsc-3.3-p2/src/mat/interface/matrix.c [0]PETSC ERROR: waigensolvprojforc() line 33 in src/examples/waigensolvprojforc.c -------------- next part -------------- An HTML attachment was scrubbed... URL: From hzhang at mcs.anl.gov Mon Dec 17 22:05:19 2012 From: hzhang at mcs.anl.gov (Hong Zhang) Date: Mon, 17 Dec 2012 22:05:19 -0600 Subject: [petsc-users] MatMatMult Question In-Reply-To: References: Message-ID: Nachiket : Which version of petsc is used? Did you run the code in sequential? > ------------------------------------ > [0]PETSC ERROR: Arguments are incompatible! > [0]PETSC ERROR: MatMatMult requires A, seqdense, to be compatible with B, > seqdense! [0]PETSC ERROR: MatMatMult() line 8601 in /opt/petsc/petsc-3.3-p2/src/mat/interface/matrix.c The line (8601) does not match the latest petsc-3.3 and petsc-dev. I cannot reproduce this error from the latest petsc-3.3 and petsc-dev. The error complains about a failure to find MatMatMult_SeqDense_SeqDense(). Can you update to the latest petsc-3.3 or petsc-dev and see if your code still crashes? Hong > I am trying to multiply the transpose of a matrix with another matrix using > matmatmult. The transpose operator is created using MatTranspose. I get the > error at the end of the email. Is this an error saying that cols of matrix 1 > and not equal to the rows of matrix 2? I checked and the rows and columns > seem to allow matrix multiplication: Left = (54,1760), Right=(1760,54). The > snipped of the code which produces this is > > ierr = > MatMatMult(*KFullMat,ProjR,MAT_INITIAL_MATRIX,PETSC_DEFAULT,&TempMat);CHKERRQ(ierr); > ierr = MatGetSize(ProjLT,&nrowl,&ncoll); CHKERRQ(ierr); > ierr = MatGetSize(TempMat,&nrowr,&ncolr); CHKERRQ(ierr); > ierr = PetscPrintf(PETSC_COMM_WORLD,"Left = (%d,%d), > Right=(%d,%d)\n",nrowl,ncoll,nrowr,ncolr); CHKERRQ(ierr); > ierr = MatMatMult(ProjLT,TempMat,MAT_INITIAL_MATRIX,PETSC_DEFAULT,&KProj); > CHKERRQ(ierr); > ierr = MatDestroy(&TempMat); > > Thanks, > > -Nachiket > > [0]PETSC ERROR: --------------------- Error Message > ------------------------------------ > [0]PETSC ERROR: Arguments are incompatible! > [0]PETSC ERROR: MatMatMult requires A, seqdense, to be compatible with B, > seqdense! > [0]PETSC ERROR: > ------------------------------------------------------------------------ > [0]PETSC ERROR: Petsc Release Version 3.3.0, Patch 2, Fri Jul 13 15:42:00 > CDT 2012 > [0]PETSC ERROR: See docs/changes/index.html for recent updates. > [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting. > [0]PETSC ERROR: See docs/index.html for manual pages. > [0]PETSC ERROR: > ------------------------------------------------------------------------ > [0]PETSC ERROR: /home/gokhale/WAIGEN/GDEB-WAIGEN2012/bin/waigen on a > linux-gcc named asd1.wai.com by gokhale Mon Dec 17 16:58:13 2012 > [0]PETSC ERROR: Libraries linked from > /opt/petsc/petsc-3.3-p2/linux-gcc-g++-mpich-mumps-complex-debug/lib > [0]PETSC ERROR: Configure run at Mon Oct 29 18:41:24 2012 > [0]PETSC ERROR: Configure options --with-x=0 --with-mpi=1 > --download-mpich=yes --with-x11=0 --with-debugging=1 --with-clanguage=C++ > --with-shared-libraries=1 --download-mumps=yes --download-f-blas-lapack=1 > --download-parmetis=1 --download-metis --download-scalapack=1 > --download-blacs=1 --with-cmake=/usr/bin/cmake28 --with-scalar-type=complex > [0]PETSC ERROR: > ------------------------------------------------------------------------ > [0]PETSC ERROR: MatMatMult() line 8601 in > /opt/petsc/petsc-3.3-p2/src/mat/interface/matrix.c > [0]PETSC ERROR: waigensolvprojforc() line 33 in > src/examples/waigensolvprojforc.c > From gokhalen at gmail.com Tue Dec 18 09:36:30 2012 From: gokhalen at gmail.com (Nachiket Gokhale) Date: Tue, 18 Dec 2012 10:36:30 -0500 Subject: [petsc-users] MatMatMult Question In-Reply-To: References: Message-ID: Hong: I used 3.3-p2 but I was able to reproduce this error with 3.3-p5 as well. I ran it with one MPI process. I got the same error, [0]PETSC ERROR: --------------------- Error Message ------------------------------------ [0]PETSC ERROR: Arguments are incompatible! [0]PETSC ERROR: MatMatMult requires A, seqdense, to be compatible with B, seqdense! [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: Petsc Release Version 3.3.0, Patch 5, Sat Dec 1 15:10:41 CST 2012 [0]PETSC ERROR: See docs/changes/index.html for recent updates. [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting. [0]PETSC ERROR: See docs/index.html for manual pages. [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: /home/gokhale/WAIGEN/GDEB-WAIGEN2012/bin/waigen on a linux-gcc named asd1.wai.com by gokhale Tue Dec 18 10:44:18 2012 [0]PETSC ERROR: Libraries linked from /opt/petsc/petsc-3.3-p5/linux-gcc-g++-mpich-mumps-complex-debug/lib [0]PETSC ERROR: Configure run at Tue Dec 18 10:09:32 2012 [0]PETSC ERROR: Configure options --with-x=0 --with-mpi=1 --download-mpich=yes --with-x11=0 --with-debugging=1 --with-clanguage=C++ --with-shared-libraries=1 --download-mumps=yes --download-f-blas-lapack=1 --download-parmetis=1 --download-metis --download-scalapack=1 --download-blacs=1 --with-cmake=/usr/bin/cmake28 --with-scalar-type=complex [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: MatMatMult() line 8617 in /opt/petsc/petsc-3.3-p5/src/mat/interface/matrix.c [0]PETSC ERROR: waigensolvprojforc() line 31 in src/examples/waigensolvprojforc.c Using one MPI process, this error goes away when I make a temporary matrix and store the result of the first multiplication in it as in: ierr = MatLoad(ProjR,viewer);CHKERRQ(ierr); ierr = PetscViewerDestroy(&viewer);CHKERRQ(ierr); ierr = MatMatMult(*KFullMat,ProjR,MAT_INITIAL_MATRIX,PETSC_DEFAULT,&TempMat);CHKERRQ(ierr); ierr = MatDuplicate(TempMat,MAT_COPY_VALUES,&TempMat2); CHKERRQ(ierr); ierr = MatMatMult(ProjLT,TempMat2,MAT_INITIAL_MATRIX,PETSC_DEFAULT,&KProj); CHKERRQ(ierr); ierr = MatDestroy(&TempMat);CHKERRQ(ierr); ierr = MatDestroy(&TempMat2);CHKERRQ(ierr); If I run more than one process the error returns. [0]PETSC ERROR: --------------------- Error Message ------------------------------------ [0]PETSC ERROR: Arguments are incompatible! [0]PETSC ERROR: MatMatMult requires A, mpidense, to be compatible with B, mpidense! [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: Petsc Release Version 3.3.0, Patch 5, Sat Dec 1 15:10:41 CST 2012 [0]PETSC ERROR: See docs/changes/index.html for recent updates. [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting. [0]PETSC ERROR: See docs/index.html for manual pages. [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: /home/gokhale/WAIGEN/GDEB-WAIGEN2012/bin/waigen on a linux-gcc named asd1.wai.com by gokhale Tue Dec 18 10:42:15 2012 [0]PETSC ERROR: Libraries linked from /opt/petsc/petsc-3.3-p5/linux-gcc-g++-mpich-mumps-complex-debug/lib [0]PETSC ERROR: Configure run at Tue Dec 18 10:09:32 2012 [0]PETSC ERROR: Configure options --with-x=0 --with-mpi=1 --download-mpich=yes --with-x11=0 --with-debugging=1 --with-clanguage=C++ --with-shared-libraries=1 --download-mumps=yes --download-f-blas-lapack=1 --download-parmetis=1 --download-metis --download-scalapack=1 --download-blacs=1 --with-cmake=/usr/bin/cmake28 --with-scalar-type=complex [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: MatMatMult() line 8617 in /opt/petsc/petsc-3.3-p5/src/mat/interface/matrix.c [0]PETSC ERROR: waigensolvprojforc() line 31 in src/examples/waigensolvprojforc.c If it matters I am using Petsc through SlepC-3.3-p3 -Nachiket On Mon, Dec 17, 2012 at 11:05 PM, Hong Zhang wrote: > Nachiket : > > Which version of petsc is used? Did you run the code in sequential? > > > ------------------------------------ > > [0]PETSC ERROR: Arguments are incompatible! > > [0]PETSC ERROR: MatMatMult requires A, seqdense, to be compatible with B, > > seqdense! > [0]PETSC ERROR: MatMatMult() line 8601 in > /opt/petsc/petsc-3.3-p2/src/mat/interface/matrix.c > > The line (8601) does not match the latest petsc-3.3 and petsc-dev. > I cannot reproduce this error from the latest petsc-3.3 and petsc-dev. > > The error complains about a failure to find MatMatMult_SeqDense_SeqDense(). > Can you update to the latest petsc-3.3 or petsc-dev and see if your > code still crashes? > > Hong > > > I am trying to multiply the transpose of a matrix with another matrix > using > > matmatmult. The transpose operator is created using MatTranspose. I get > the > > error at the end of the email. Is this an error saying that cols of > matrix 1 > > and not equal to the rows of matrix 2? I checked and the rows and columns > > seem to allow matrix multiplication: Left = (54,1760), Right=(1760,54). > The > > snipped of the code which produces this is > > > > ierr = > > > MatMatMult(*KFullMat,ProjR,MAT_INITIAL_MATRIX,PETSC_DEFAULT,&TempMat);CHKERRQ(ierr); > > ierr = MatGetSize(ProjLT,&nrowl,&ncoll); CHKERRQ(ierr); > > ierr = MatGetSize(TempMat,&nrowr,&ncolr); CHKERRQ(ierr); > > ierr = PetscPrintf(PETSC_COMM_WORLD,"Left = (%d,%d), > > Right=(%d,%d)\n",nrowl,ncoll,nrowr,ncolr); CHKERRQ(ierr); > > ierr = > MatMatMult(ProjLT,TempMat,MAT_INITIAL_MATRIX,PETSC_DEFAULT,&KProj); > > CHKERRQ(ierr); > > ierr = MatDestroy(&TempMat); > > > > Thanks, > > > > -Nachiket > > > > [0]PETSC ERROR: --------------------- Error Message > > ------------------------------------ > > [0]PETSC ERROR: Arguments are incompatible! > > [0]PETSC ERROR: MatMatMult requires A, seqdense, to be compatible with B, > > seqdense! > > [0]PETSC ERROR: > > ------------------------------------------------------------------------ > > [0]PETSC ERROR: Petsc Release Version 3.3.0, Patch 2, Fri Jul 13 15:42:00 > > CDT 2012 > > [0]PETSC ERROR: See docs/changes/index.html for recent updates. > > [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting. > > [0]PETSC ERROR: See docs/index.html for manual pages. > > [0]PETSC ERROR: > > ------------------------------------------------------------------------ > > [0]PETSC ERROR: /home/gokhale/WAIGEN/GDEB-WAIGEN2012/bin/waigen on a > > linux-gcc named asd1.wai.com by gokhale Mon Dec 17 16:58:13 2012 > > [0]PETSC ERROR: Libraries linked from > > /opt/petsc/petsc-3.3-p2/linux-gcc-g++-mpich-mumps-complex-debug/lib > > [0]PETSC ERROR: Configure run at Mon Oct 29 18:41:24 2012 > > [0]PETSC ERROR: Configure options --with-x=0 --with-mpi=1 > > --download-mpich=yes --with-x11=0 --with-debugging=1 --with-clanguage=C++ > > --with-shared-libraries=1 --download-mumps=yes --download-f-blas-lapack=1 > > --download-parmetis=1 --download-metis --download-scalapack=1 > > --download-blacs=1 --with-cmake=/usr/bin/cmake28 > --with-scalar-type=complex > > [0]PETSC ERROR: > > ------------------------------------------------------------------------ > > [0]PETSC ERROR: MatMatMult() line 8601 in > > /opt/petsc/petsc-3.3-p2/src/mat/interface/matrix.c > > [0]PETSC ERROR: waigensolvprojforc() line 33 in > > src/examples/waigensolvprojforc.c > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gaurish108 at gmail.com Tue Dec 18 20:17:54 2012 From: gaurish108 at gmail.com (Gaurish Telang) Date: Tue, 18 Dec 2012 21:17:54 -0500 Subject: [petsc-users] Simple query about GPU usage in PETSc Message-ID: I am trying out PETSc's GPU features for the first time. After skimming, a paper on the PETSc-GPU interface. http://www.stanford.edu/~vminden/docs/gpus.pdf I just wanted to confirm whether the following observation is correct. Suppose I want to solve Ax=b and set the PETSc vector- and matrix-type from the command-line Then to make my code run on the GPU, *all* I need to do is to (1) set the "-vec_type" at the command-line as "seqcusp" or "mpicusp" (depending on whether I am using a single/multiple GPU process ) (2) set the "-mat_type" at the command-line as "seqaijcusp" or " mpiaijcusp" (depending on whether I am using a single/multiple CPU process ) (3) Solving the system Ax=b is done the "usual" way (see below) i.e nothing CUDA specific. ierr = KSPCreate(PETSC_COMM_WORLD,&ksp);CHKERRQ(ierr); ierr = KSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN);CHKERRQ(ierr); ierr = KSPGetPC(ksp,&pc);CHKERRQ(ierr); ierr = PCSetType(pc,PCJACOBI);CHKERRQ(ierr); ierr = KSPSetTolerances(ksp,1.e-5,PETSC_DEFAULT,PETSC_DEFAULT,PETSC_DEFAULT);CHKERRQ(ierr); if (nonzeroguess) { PetscScalar p = .5; ierr = VecSet(x,p);CHKERRQ(ierr); ierr = KSPSetInitialGuessNonzero(ksp,PETSC_TRUE);CHKERRQ(ierr); } ierr = KSPSolve(ksp,b,x);CHKERRQ(ierr); (4) Looking at the type of the vector and the matrix, PETSc hands over the control to the corresponding CUSP solver. Thank you, Gaurish -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Dec 18 20:58:54 2012 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 18 Dec 2012 21:58:54 -0500 Subject: [petsc-users] Simple query about GPU usage in PETSc In-Reply-To: References: Message-ID: On Tue, Dec 18, 2012 at 9:17 PM, Gaurish Telang wrote: > I am trying out PETSc's GPU features for the first time. > > After skimming, a paper on the PETSc-GPU interface. > http://www.stanford.edu/~vminden/docs/gpus.pdf > > I just wanted to confirm whether the following observation is correct. > > Suppose I want to solve Ax=b and set the PETSc vector- and matrix-type > from the command-line > > Then to make my code run on the GPU, *all* I need to do is to > (1) set the "-vec_type" at the command-line as "seqcusp" or "mpicusp" > (depending on whether I am using a single/multiple GPU process ) > (2) set the "-mat_type" at the command-line as "seqaijcusp" or > "mpiaijcusp" (depending on whether I am using a single/multiple CPU process > ) > (3) Solving the system Ax=b is done the "usual" way (see below) i.e > nothing CUDA specific. > > ierr = KSPCreate(PETSC_COMM_WORLD,&ksp);CHKERRQ(ierr); > ierr = KSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN);CHKERRQ(ierr); > ierr = KSPGetPC(ksp,&pc);CHKERRQ(ierr); > ierr = PCSetType(pc,PCJACOBI);CHKERRQ(ierr); > ierr = > KSPSetTolerances(ksp,1.e-5,PETSC_DEFAULT,PETSC_DEFAULT,PETSC_DEFAULT);CHKERRQ(ierr); > > if (nonzeroguess) { > PetscScalar p = .5; > ierr = VecSet(x,p);CHKERRQ(ierr); > ierr = KSPSetInitialGuessNonzero(ksp,PETSC_TRUE);CHKERRQ(ierr); > } > ierr = KSPSolve(ksp,b,x);CHKERRQ(ierr); > > (4) Looking at the type of the vector and the matrix, PETSc hands over the > control to the corresponding CUSP solver. Yes, that should work. Matt > Thank you, > > Gaurish -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener From daniel.arndt at stud.uni-goettingen.de Wed Dec 19 09:00:22 2012 From: daniel.arndt at stud.uni-goettingen.de (Daniel Arndt) Date: Wed, 19 Dec 2012 16:00:22 +0100 Subject: [petsc-users] early convergence failure In-Reply-To: <7B5C8583-1A21-49C3-B8B3-FF708B84D4F8@mcs.anl.gov> References: <7B5C8583-1A21-49C3-B8B3-FF708B84D4F8@mcs.anl.gov> Message-ID: <50D1D686.8030307@stud.uni-goettingen.de> >>/ Thank you Barry for your suggestions. />/> //The error I get is now KSP_DIVERGED_INDEFINITE_PC. The matrix that I try to invert is actually symmetric and positive definite. I was not aware that this can lead to a indefinite preconditioner. / > Absolutely. Many preconditioners do not retain this feature even in exact precision and with numerical effects it can even appear unexpected. > By default BoomAMG doesn't retain this. Is there then a possibility to tell the BlockJacobi preconditioner to be positive definit as there is for the BoomerAMG preconditioner via -pc_hypre_boomeramg_relax_type_all symmetric-SOR/Jacobi? Bests Daniel // -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Dec 19 09:06:05 2012 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 19 Dec 2012 10:06:05 -0500 Subject: [petsc-users] early convergence failure In-Reply-To: <50D1D686.8030307@stud.uni-goettingen.de> References: <7B5C8583-1A21-49C3-B8B3-FF708B84D4F8@mcs.anl.gov> <50D1D686.8030307@stud.uni-goettingen.de> Message-ID: On Wed, Dec 19, 2012 at 10:00 AM, Daniel Arndt wrote: >>> Thank you Barry for your suggestions. >>> The error I get is now KSP_DIVERGED_INDEFINITE_PC. The matrix that I try >>> to invert is actually symmetric and positive definite. I was not aware that >>> this can lead to a indefinite preconditioner. > >> Absolutely. Many preconditioners do not retain this feature even in >> exact precision and with numerical effects it can even appear unexpected. >> By default BoomAMG doesn't retain this. > > Is there then a possibility to tell the BlockJacobi preconditioner to be > positive definit as there is for the BoomerAMG preconditioner via > -pc_hypre_boomeramg_relax_type_all symmetric-SOR/Jacobi? Block-Jacobi is just a container. You can choose the inner solver to respect this. Matt > Bests > Daniel > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener From stefan.kurzbach at tuhh.de Wed Dec 19 10:25:09 2012 From: stefan.kurzbach at tuhh.de (Stefan Kurzbach) Date: Wed, 19 Dec 2012 17:25:09 +0100 Subject: [petsc-users] Direct Schur complement domain decomposition Message-ID: <002b01cdde05$6a2d0330$3e870990$@tuhh.de> Hello everybody, in my recent research on parallelization of a 2D unstructured flow model code I came upon a question on domain decomposition techniques in "grids". Maybe someone knows of any previous results on this? Typically, when doing large simulations with many unknowns, the problem is distributed to many computer nodes and solved in parallel by some iterative method. Many of these iterative methods boil down to a large number of distributed matrix-vector multiplications (in the order of the number of iterations). This means there are many synchronization points in the algorithms, which makes them tightly coupled. This has been found to work well on clusters with fast networks. Now my question: What if there is a small number of very powerful nodes (say less than 10), which are connected by a slow network, e.g. several computer clusters connected over the internet (some people call this "grid computing"). I expect that the traditional iterative methods will not be as efficient here (any references?). My guess is that a solution method with fewer synchronization points will work better, even though that method may be computationally more expensive than traditional methods. An example would be a domain composition approach with direct solution of the Schur complement on the interface. This requires that the interface size has to be small compared to the subdomain size. As this algorithm basically works in three decoupled phases (solve the subdomains for several right hand sides, assemble and solve the Schur complement system, correct the subdomain results) it should be suited well, but I have no idea how to test or otherwise prove it. Has anybody made any thoughts on this before, possibly dating back to the 80ies and 90ies, where slow networks were more common? Best regards Stefan -------------- next part -------------- An HTML attachment was scrubbed... URL: From dog at lanl.gov Wed Dec 19 15:54:17 2012 From: dog at lanl.gov (Gunter, David O) Date: Wed, 19 Dec 2012 21:54:17 +0000 Subject: [petsc-users] Compiling 3.3 for Open-MPI on a SLURM system Message-ID: I am trying to compile PETSc on a SLURM-based system using GCC, openmpi-1.6.3. Here's my configure line: $ ./configure --prefix=/tmp/dog/petsc-3.3-p5 --with-mpiexec=mpiexec configure bombs out on this test: TESTING: configureMPITypes from config.packages.MPI(/usr/aprojects/hpctools/dog/petsc/petsc-3.3-p5/config/BuildSystem/config/packages/MPI.py:230)srun: error: slurm_send_recv_rc_msg_only_one: Connection timed out srun: error: slurm_send_recv_rc_msg_only_one: Connection timed out It should not be trying to srun anything as we use mpiexec with Open-MPI. Any ideas? -david -- David Gunter HPC-3: Infrastructure Team Los Alamos National Laboratory From knepley at gmail.com Wed Dec 19 16:03:02 2012 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 19 Dec 2012 17:03:02 -0500 Subject: [petsc-users] Compiling 3.3 for Open-MPI on a SLURM system In-Reply-To: References: Message-ID: On Wed, Dec 19, 2012 at 4:54 PM, Gunter, David O wrote: > I am trying to compile PETSc on a SLURM-based system using GCC, openmpi-1.6.3. > > Here's my configure line: > > $ ./configure --prefix=/tmp/dog/petsc-3.3-p5 --with-mpiexec=mpiexec > > configure bombs out on this test: > > TESTING: configureMPITypes from config.packages.MPI(/usr/aprojects/hpctools/dog/petsc/petsc-3.3-p5/config/BuildSystem/config/packages/MPI.py:230)srun: error: slurm_send_recv_rc_msg_only_one: Connection timed out > srun: error: slurm_send_recv_rc_msg_only_one: Connection timed out > > It should not be trying to srun anything as we use mpiexec with Open-MPI. Any ideas? If you can't run anything, you need --with-batch for the configure. Thanks, Matt > -david > > -- > David Gunter > HPC-3: Infrastructure Team > Los Alamos National Laboratory > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener From balay at mcs.anl.gov Wed Dec 19 16:03:56 2012 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 19 Dec 2012 16:03:56 -0600 (CST) Subject: [petsc-users] Compiling 3.3 for Open-MPI on a SLURM system In-Reply-To: References: Message-ID: Perhaps mpiexec is invoking srun internally? The details should be in configure.log [petsc configure doesn't know about srun]. Satish On Wed, 19 Dec 2012, Gunter, David O wrote: > I am trying to compile PETSc on a SLURM-based system using GCC, openmpi-1.6.3. > > Here's my configure line: > > $ ./configure --prefix=/tmp/dog/petsc-3.3-p5 --with-mpiexec=mpiexec > > configure bombs out on this test: > > TESTING: configureMPITypes from config.packages.MPI(/usr/aprojects/hpctools/dog/petsc/petsc-3.3-p5/config/BuildSystem/config/packages/MPI.py:230)srun: error: slurm_send_recv_rc_msg_only_one: Connection timed out > srun: error: slurm_send_recv_rc_msg_only_one: Connection timed out > > It should not be trying to srun anything as we use mpiexec with Open-MPI. Any ideas? > > -david > > -- > David Gunter > HPC-3: Infrastructure Team > Los Alamos National Laboratory > > > > > From thomas.witkowski at tu-dresden.de Thu Dec 20 14:16:52 2012 From: thomas.witkowski at tu-dresden.de (Thomas Witkowski) Date: Thu, 20 Dec 2012 21:16:52 +0100 Subject: [petsc-users] LU factorization and solution of independent matrices does not scale, why? Message-ID: <50D37234.2040205@tu-dresden.de> In my multilevel FETI-DP code, I have localized course matrices, which are defined on only a subset of all MPI tasks, typically between 4 and 64 tasks. The MatAIJ and the KSP objects are both defined on a MPI communicator, which is a subset of MPI::COMM_WORLD. The LU factorization of the matrices is computed with either MUMPS or superlu_dist, but both show some scaling property I really wonder of: When the overall problem size is increased, the solve with the LU factorization of the local matrices does not scale! But why not? I just increase the number of local matrices, but all of them are independent of each other. Some example: I use 64 cores, each coarse matrix is spanned by 4 cores so there are 16 MPI communicators with 16 coarse space matrices. The problem need to solve 192 times with the coarse space systems, and this takes together 0.09 seconds. Now I increase the number of cores to 256, but let the local coarse space be defined again on only 4 cores. Again, 192 solutions with these coarse spaces are required, but now this takes 0.24 seconds. The same for 1024 cores, and we are at 1.7 seconds for the local coarse space solver! For me, this is a total mystery! Any idea how to explain, debug and eventually how to resolve this problem? Thomas From Thomas.Witkowski at tu-dresden.de Thu Dec 20 14:19:59 2012 From: Thomas.Witkowski at tu-dresden.de (Thomas Witkowski) Date: Thu, 20 Dec 2012 21:19:59 +0100 Subject: [petsc-users] LU factorization and solution of independent matrices does not scale, why? Message-ID: <20121220211959.9srb50dlcc4wgc4o@mail.zih.tu-dresden.de> In my multilevel FETI-DP code, I have localized course matrices, which are defined on only a subset of all MPI tasks, typically between 4 and 64 tasks. The MatAIJ and the KSP objects are both defined on a MPI communicator, which is a subset of MPI::COMM_WORLD. The LU factorization of the matrices is computed with either MUMPS or superlu_dist, but both show some scaling property I really wonder of: When the overall problem size is increased, the solve with the LU factorization of the local matrices does not scale! But why not? I just increase the number of local matrices, but all of them are independent of each other. Some example: I use 64 cores, each coarse matrix is spanned by 4 cores so there are 16 MPI communicators with 16 coarse space matrices. The problem need to solve 192 times with the coarse space systems, and this takes together 0.09 seconds. Now I increase the number of cores to 256, but let the local coarse space be defined again on only 4 cores. Again, 192 solutions with these coarse spaces are required, but now this takes 0.24 seconds. The same for 1024 cores, and we are at 1.7 seconds for the local coarse space solver! For me, this is a total mystery! Any idea how to explain, debug and eventually how to resolve this problem? Thomas From bsmith at mcs.anl.gov Thu Dec 20 14:23:45 2012 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 20 Dec 2012 14:23:45 -0600 Subject: [petsc-users] LU factorization and solution of independent matrices does not scale, why? In-Reply-To: <50D37234.2040205@tu-dresden.de> References: <50D37234.2040205@tu-dresden.de> Message-ID: <4F2AF113-B369-42AD-95B9-3D4C1E8F5CEE@mcs.anl.gov> Are you timing ONLY the time to factor and solve the subproblems? Or also the time to get the data to the collection of 4 cores at a time? If you are only using LU for these problems and not elsewhere in the code you can get the factorization and time from MatLUFactor() and MatSolve() or you can use stages to put this calculation in its own stage and use the MatLUFactor() and MatSolve() time from that stage. Also look at the load balancing column for the factorization and solve stage, it is well balanced? Barry On Dec 20, 2012, at 2:16 PM, Thomas Witkowski wrote: > In my multilevel FETI-DP code, I have localized course matrices, which are defined on only a subset of all MPI tasks, typically between 4 and 64 tasks. The MatAIJ and the KSP objects are both defined on a MPI communicator, which is a subset of MPI::COMM_WORLD. The LU factorization of the matrices is computed with either MUMPS or superlu_dist, but both show some scaling property I really wonder of: When the overall problem size is increased, the solve with the LU factorization of the local matrices does not scale! But why not? I just increase the number of local matrices, but all of them are independent of each other. Some example: I use 64 cores, each coarse matrix is spanned by 4 cores so there are 16 MPI communicators with 16 coarse space matrices. The problem need to solve 192 times with the coarse space systems, and this takes together 0.09 seconds. Now I increase the number of cores to 256, but let the local coarse space be defined again on only 4 cores. Again, 192 solutions with these coarse spaces are required, but now this takes 0.24 seconds. The same for 1024 cores, and we are at 1.7 seconds for the local coarse space solver! > > For me, this is a total mystery! Any idea how to explain, debug and eventually how to resolve this problem? > > Thomas From Thomas.Witkowski at tu-dresden.de Thu Dec 20 14:39:50 2012 From: Thomas.Witkowski at tu-dresden.de (Thomas Witkowski) Date: Thu, 20 Dec 2012 21:39:50 +0100 Subject: [petsc-users] LU factorization and solution of independent matrices does not scale, why? In-Reply-To: <4F2AF113-B369-42AD-95B9-3D4C1E8F5CEE@mcs.anl.gov> References: <50D37234.2040205@tu-dresden.de> <4F2AF113-B369-42AD-95B9-3D4C1E8F5CEE@mcs.anl.gov> Message-ID: <20121220213950.nyu4ddy1og0kkw8c@mail.zih.tu-dresden.de> I cannot use the information from log_summary, as I have three different LU factorizations and solve (local matrices and two hierarchies of coarse grids). Therefore, I use the following work around to get the timing of the solve I'm intrested in: MPI::COMM_WORLD.Barrier(); wtime = MPI::Wtime(); KSPSolve(*(data->ksp_schur_primal_local), tmp_primal, tmp_primal); FetiTimings::fetiSolve03 += (MPI::Wtime() - wtime); The factorization is done explicitly before with "KSPSetUp", so I can measure the time for LU factorization. It also does not scale! For 64 cores, I takes 0.05 seconds, for 1024 cores 1.2 seconds. In all calculations, the local coarse space matrices defined on four cores have exactly the same number of rows and exactly the same number of non zero entries. So, from my point of view, the time should be absolutely constant. Thomas Zitat von Barry Smith : > > Are you timing ONLY the time to factor and solve the subproblems? > Or also the time to get the data to the collection of 4 cores at a > time? > > If you are only using LU for these problems and not elsewhere in > the code you can get the factorization and time from MatLUFactor() > and MatSolve() or you can use stages to put this calculation in its > own stage and use the MatLUFactor() and MatSolve() time from that > stage. > Also look at the load balancing column for the factorization and > solve stage, it is well balanced? > > Barry > > On Dec 20, 2012, at 2:16 PM, Thomas Witkowski > wrote: > >> In my multilevel FETI-DP code, I have localized course matrices, >> which are defined on only a subset of all MPI tasks, typically >> between 4 and 64 tasks. The MatAIJ and the KSP objects are both >> defined on a MPI communicator, which is a subset of >> MPI::COMM_WORLD. The LU factorization of the matrices is computed >> with either MUMPS or superlu_dist, but both show some scaling >> property I really wonder of: When the overall problem size is >> increased, the solve with the LU factorization of the local >> matrices does not scale! But why not? I just increase the number of >> local matrices, but all of them are independent of each other. >> Some example: I use 64 cores, each coarse matrix is spanned by 4 >> cores so there are 16 MPI communicators with 16 coarse space >> matrices. The problem need to solve 192 times with the coarse >> space systems, and this takes together 0.09 seconds. Now I >> increase the number of cores to 256, but let the local coarse >> space be defined again on only 4 cores. Again, 192 solutions with >> these coarse spaces are required, but now this takes 0.24 seconds. >> The same for 1024 cores, and we are at 1.7 seconds for the local >> coarse space solver! >> >> For me, this is a total mystery! Any idea how to explain, debug and >> eventually how to resolve this problem? >> >> Thomas > > From jack.poulson at gmail.com Thu Dec 20 14:53:34 2012 From: jack.poulson at gmail.com (Jack Poulson) Date: Thu, 20 Dec 2012 14:53:34 -0600 Subject: [petsc-users] LU factorization and solution of independent matrices does not scale, why? In-Reply-To: <20121220213950.nyu4ddy1og0kkw8c@mail.zih.tu-dresden.de> References: <50D37234.2040205@tu-dresden.de> <4F2AF113-B369-42AD-95B9-3D4C1E8F5CEE@mcs.anl.gov> <20121220213950.nyu4ddy1og0kkw8c@mail.zih.tu-dresden.de> Message-ID: Hi Thomas, Network topology is important. Since most machines are not fully connected, random subsets of four processes will become more scattered about the cluster as you increase your total number of processes. Jack On Dec 20, 2012 12:39 PM, "Thomas Witkowski" wrote: > I cannot use the information from log_summary, as I have three different > LU factorizations and solve (local matrices and two hierarchies of coarse > grids). Therefore, I use the following work around to get the timing of the > solve I'm intrested in: > > MPI::COMM_WORLD.Barrier(); > wtime = MPI::Wtime(); > KSPSolve(*(data->ksp_schur_**primal_local), tmp_primal, tmp_primal); > FetiTimings::fetiSolve03 += (MPI::Wtime() - wtime); > > The factorization is done explicitly before with "KSPSetUp", so I can > measure the time for LU factorization. It also does not scale! For 64 > cores, I takes 0.05 seconds, for 1024 cores 1.2 seconds. In all > calculations, the local coarse space matrices defined on four cores have > exactly the same number of rows and exactly the same number of non zero > entries. So, from my point of view, the time should be absolutely constant. > > Thomas > > Zitat von Barry Smith : > > >> Are you timing ONLY the time to factor and solve the subproblems? Or >> also the time to get the data to the collection of 4 cores at a time? >> >> If you are only using LU for these problems and not elsewhere in the >> code you can get the factorization and time from MatLUFactor() and >> MatSolve() or you can use stages to put this calculation in its own stage >> and use the MatLUFactor() and MatSolve() time from that stage. >> Also look at the load balancing column for the factorization and solve >> stage, it is well balanced? >> >> Barry >> >> On Dec 20, 2012, at 2:16 PM, Thomas Witkowski < >> thomas.witkowski at tu-dresden.**de > wrote: >> >> In my multilevel FETI-DP code, I have localized course matrices, which >>> are defined on only a subset of all MPI tasks, typically between 4 and 64 >>> tasks. The MatAIJ and the KSP objects are both defined on a MPI >>> communicator, which is a subset of MPI::COMM_WORLD. The LU factorization >>> of the matrices is computed with either MUMPS or superlu_dist, but both >>> show some scaling property I really wonder of: When the overall problem >>> size is increased, the solve with the LU factorization of the local >>> matrices does not scale! But why not? I just increase the number of local >>> matrices, but all of them are independent of each other. Some example: I >>> use 64 cores, each coarse matrix is spanned by 4 cores so there are 16 MPI >>> communicators with 16 coarse space matrices. The problem need to solve 192 >>> times with the coarse space systems, and this takes together 0.09 seconds. >>> Now I increase the number of cores to 256, but let the local coarse space >>> be defined again on only 4 cores. Again, 192 solutions with these coarse >>> spaces are required, but now this takes 0.24 seconds. The same for 1024 >>> cores, and we are at 1.7 seconds for the local coarse space solver! >>> >>> For me, this is a total mystery! Any idea how to explain, debug and >>> eventually how to resolve this problem? >>> >>> Thomas >>> >> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Thomas.Witkowski at tu-dresden.de Thu Dec 20 15:01:29 2012 From: Thomas.Witkowski at tu-dresden.de (Thomas Witkowski) Date: Thu, 20 Dec 2012 22:01:29 +0100 Subject: [petsc-users] LU factorization and solution of independent matrices does not scale, why? In-Reply-To: References: <50D37234.2040205@tu-dresden.de> <4F2AF113-B369-42AD-95B9-3D4C1E8F5CEE@mcs.anl.gov> <20121220213950.nyu4ddy1og0kkw8c@mail.zih.tu-dresden.de> Message-ID: <20121220220129.5f4h5pbq8gwsc0w4@mail.zih.tu-dresden.de> Jack, I also considered this problem. The 4 MPI tasks of each coarse space matrix should run all on one node (each node contains 4 dual core CPUs). I'm not 100% sure, but I discussed this with the administrators of the system. The system should schedule always the first 8 ranks to the first node, and so on. And the coarse space matrices are build on ranks 0-3, 4-7 ... I'm running at the moment some benchmarks, where I replaced the local LU factorization from using UMFPACK to MUMPS. Each matrix and the corresponding ksp object are defined on PETSC_COMM_SELF and the problem is perfectly balanced (the grid is a unit square uniformly refined). Lets see... Thomas Zitat von Jack Poulson : > Hi Thomas, > > Network topology is important. Since most machines are not fully connected, > random subsets of four processes will become more scattered about the > cluster as you increase your total number of processes. > > Jack > On Dec 20, 2012 12:39 PM, "Thomas Witkowski" > wrote: > >> I cannot use the information from log_summary, as I have three different >> LU factorizations and solve (local matrices and two hierarchies of coarse >> grids). Therefore, I use the following work around to get the timing of the >> solve I'm intrested in: >> >> MPI::COMM_WORLD.Barrier(); >> wtime = MPI::Wtime(); >> KSPSolve(*(data->ksp_schur_**primal_local), tmp_primal, tmp_primal); >> FetiTimings::fetiSolve03 += (MPI::Wtime() - wtime); >> >> The factorization is done explicitly before with "KSPSetUp", so I can >> measure the time for LU factorization. It also does not scale! For 64 >> cores, I takes 0.05 seconds, for 1024 cores 1.2 seconds. In all >> calculations, the local coarse space matrices defined on four cores have >> exactly the same number of rows and exactly the same number of non zero >> entries. So, from my point of view, the time should be absolutely constant. >> >> Thomas >> >> Zitat von Barry Smith : >> >> >>> Are you timing ONLY the time to factor and solve the subproblems? Or >>> also the time to get the data to the collection of 4 cores at a time? >>> >>> If you are only using LU for these problems and not elsewhere in the >>> code you can get the factorization and time from MatLUFactor() and >>> MatSolve() or you can use stages to put this calculation in its own stage >>> and use the MatLUFactor() and MatSolve() time from that stage. >>> Also look at the load balancing column for the factorization and solve >>> stage, it is well balanced? >>> >>> Barry >>> >>> On Dec 20, 2012, at 2:16 PM, Thomas Witkowski < >>> thomas.witkowski at tu-dresden.**de > wrote: >>> >>> In my multilevel FETI-DP code, I have localized course matrices, which >>>> are defined on only a subset of all MPI tasks, typically between 4 and 64 >>>> tasks. The MatAIJ and the KSP objects are both defined on a MPI >>>> communicator, which is a subset of MPI::COMM_WORLD. The LU factorization >>>> of the matrices is computed with either MUMPS or superlu_dist, but both >>>> show some scaling property I really wonder of: When the overall problem >>>> size is increased, the solve with the LU factorization of the local >>>> matrices does not scale! But why not? I just increase the number >>>> of local >>>> matrices, but all of them are independent of each other. Some example: I >>>> use 64 cores, each coarse matrix is spanned by 4 cores so there >>>> are 16 MPI >>>> communicators with 16 coarse space matrices. The problem need to >>>> solve 192 >>>> times with the coarse space systems, and this takes together >>>> 0.09 seconds. >>>> Now I increase the number of cores to 256, but let the local coarse space >>>> be defined again on only 4 cores. Again, 192 solutions with these coarse >>>> spaces are required, but now this takes 0.24 seconds. The same for 1024 >>>> cores, and we are at 1.7 seconds for the local coarse space solver! >>>> >>>> For me, this is a total mystery! Any idea how to explain, debug and >>>> eventually how to resolve this problem? >>>> >>>> Thomas >>>> >>> >>> >>> >> >> > From jack.poulson at gmail.com Thu Dec 20 15:07:18 2012 From: jack.poulson at gmail.com (Jack Poulson) Date: Thu, 20 Dec 2012 15:07:18 -0600 Subject: [petsc-users] LU factorization and solution of independent matrices does not scale, why? In-Reply-To: <20121220220129.5f4h5pbq8gwsc0w4@mail.zih.tu-dresden.de> References: <50D37234.2040205@tu-dresden.de> <4F2AF113-B369-42AD-95B9-3D4C1E8F5CEE@mcs.anl.gov> <20121220213950.nyu4ddy1og0kkw8c@mail.zih.tu-dresden.de> <20121220220129.5f4h5pbq8gwsc0w4@mail.zih.tu-dresden.de> Message-ID: Hi Thomas, Assuming this is not the issue (it is probably worth explicitly measuring), it is also important to ensure that the sparsity pattern is preserved, not just the number of nonzeros per row. A sparse matrix with random nonzero locations is much more expensive to factor than one with entries near the diagonal. Jack On Dec 20, 2012 1:01 PM, "Thomas Witkowski" wrote: > Jack, I also considered this problem. The 4 MPI tasks of each coarse space > matrix should run all on one node (each node contains 4 dual core CPUs). > I'm not 100% sure, but I discussed this with the administrators of the > system. The system should schedule always the first 8 ranks to the first > node, and so on. And the coarse space matrices are build on ranks 0-3, 4-7 > ... > > I'm running at the moment some benchmarks, where I replaced the local LU > factorization from using UMFPACK to MUMPS. Each matrix and the > corresponding ksp object are defined on PETSC_COMM_SELF and the problem is > perfectly balanced (the grid is a unit square uniformly refined). Lets > see... > > Thomas > > Zitat von Jack Poulson : > > Hi Thomas, >> >> Network topology is important. Since most machines are not fully >> connected, >> random subsets of four processes will become more scattered about the >> cluster as you increase your total number of processes. >> >> Jack >> On Dec 20, 2012 12:39 PM, "Thomas Witkowski" < >> Thomas.Witkowski at tu-dresden.**de > >> wrote: >> >> I cannot use the information from log_summary, as I have three different >>> LU factorizations and solve (local matrices and two hierarchies of coarse >>> grids). Therefore, I use the following work around to get the timing of >>> the >>> solve I'm intrested in: >>> >>> MPI::COMM_WORLD.Barrier(); >>> wtime = MPI::Wtime(); >>> KSPSolve(*(data->ksp_schur_****primal_local), tmp_primal, >>> tmp_primal); >>> FetiTimings::fetiSolve03 += (MPI::Wtime() - wtime); >>> >>> The factorization is done explicitly before with "KSPSetUp", so I can >>> measure the time for LU factorization. It also does not scale! For 64 >>> cores, I takes 0.05 seconds, for 1024 cores 1.2 seconds. In all >>> calculations, the local coarse space matrices defined on four cores have >>> exactly the same number of rows and exactly the same number of non zero >>> entries. So, from my point of view, the time should be absolutely >>> constant. >>> >>> Thomas >>> >>> Zitat von Barry Smith : >>> >>> >>> Are you timing ONLY the time to factor and solve the subproblems? Or >>>> also the time to get the data to the collection of 4 cores at a time? >>>> >>>> If you are only using LU for these problems and not elsewhere in the >>>> code you can get the factorization and time from MatLUFactor() and >>>> MatSolve() or you can use stages to put this calculation in its own >>>> stage >>>> and use the MatLUFactor() and MatSolve() time from that stage. >>>> Also look at the load balancing column for the factorization and solve >>>> stage, it is well balanced? >>>> >>>> Barry >>>> >>>> On Dec 20, 2012, at 2:16 PM, Thomas Witkowski < >>>> thomas.witkowski at tu-dresden.****de >> >>>> wrote: >>>> >>>> In my multilevel FETI-DP code, I have localized course matrices, which >>>> >>>>> are defined on only a subset of all MPI tasks, typically between 4 >>>>> and 64 >>>>> tasks. The MatAIJ and the KSP objects are both defined on a MPI >>>>> communicator, which is a subset of MPI::COMM_WORLD. The LU >>>>> factorization >>>>> of the matrices is computed with either MUMPS or superlu_dist, but >>>>> both >>>>> show some scaling property I really wonder of: When the overall >>>>> problem >>>>> size is increased, the solve with the LU factorization of the local >>>>> matrices does not scale! But why not? I just increase the number of >>>>> local >>>>> matrices, but all of them are independent of each other. Some >>>>> example: I >>>>> use 64 cores, each coarse matrix is spanned by 4 cores so there are >>>>> 16 MPI >>>>> communicators with 16 coarse space matrices. The problem need to >>>>> solve 192 >>>>> times with the coarse space systems, and this takes together 0.09 >>>>> seconds. >>>>> Now I increase the number of cores to 256, but let the local coarse >>>>> space >>>>> be defined again on only 4 cores. Again, 192 solutions with these >>>>> coarse >>>>> spaces are required, but now this takes 0.24 seconds. The same for >>>>> 1024 >>>>> cores, and we are at 1.7 seconds for the local coarse space solver! >>>>> >>>>> For me, this is a total mystery! Any idea how to explain, debug and >>>>> eventually how to resolve this problem? >>>>> >>>>> Thomas >>>>> >>>>> >>>> >>>> >>>> >>> >>> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Thomas.Witkowski at tu-dresden.de Thu Dec 20 15:53:12 2012 From: Thomas.Witkowski at tu-dresden.de (Thomas Witkowski) Date: Thu, 20 Dec 2012 22:53:12 +0100 Subject: [petsc-users] LU factorization and solution of independent matrices does not scale, why? In-Reply-To: <20121220220129.5f4h5pbq8gwsc0w4@mail.zih.tu-dresden.de> References: <50D37234.2040205@tu-dresden.de> <4F2AF113-B369-42AD-95B9-3D4C1E8F5CEE@mcs.anl.gov> <20121220213950.nyu4ddy1og0kkw8c@mail.zih.tu-dresden.de> <20121220220129.5f4h5pbq8gwsc0w4@mail.zih.tu-dresden.de> Message-ID: <20121220225312.h78wmlgv4g8sggco@mail.zih.tu-dresden.de> So, I run the benchmark for 16, 64, 256 and 1024 MPI tasks. I replaced a local UMFPACK LU factorization of interior matrices and the 192 solves with them by MUMPS. It all three cases, the size, the structure and the values of the matrices are all the same. As expected, with UMFPACK both times for factorization and solve are the same for different number of cores. The MUMPS, this is not the case: 16 cores: factorization: 3.87 sec solves: 46 sec 64 cores: factorization: 4.29 sec solves: 70 sec 256 cores: factorization: 6.11 sec solves: 254 sec 1024 cores: factorization: 25.64 sec solves: forever :) This is really baaad! There is no communication (PETSC_COMM_SELF, MatAIJSeq) and all matrices are of the same size. What's going on here? May be its possible to reproduce this scenario with one of the PETSc examples? Thomas Zitat von Thomas Witkowski : > Jack, I also considered this problem. The 4 MPI tasks of each coarse > space matrix should run all on one node (each node contains 4 dual core > CPUs). I'm not 100% sure, but I discussed this with the administrators > of the system. The system should schedule always the first 8 ranks to > the first node, and so on. And the coarse space matrices are build on > ranks 0-3, 4-7 ... > > I'm running at the moment some benchmarks, where I replaced the local > LU factorization from using UMFPACK to MUMPS. Each matrix and the > corresponding ksp object are defined on PETSC_COMM_SELF and the problem > is perfectly balanced (the grid is a unit square uniformly refined). > Lets see... > > Thomas > > Zitat von Jack Poulson : > >> Hi Thomas, >> >> Network topology is important. Since most machines are not fully connected, >> random subsets of four processes will become more scattered about the >> cluster as you increase your total number of processes. >> >> Jack >> On Dec 20, 2012 12:39 PM, "Thomas Witkowski" >> >> wrote: >> >>> I cannot use the information from log_summary, as I have three different >>> LU factorizations and solve (local matrices and two hierarchies of coarse >>> grids). Therefore, I use the following work around to get the timing of the >>> solve I'm intrested in: >>> >>> MPI::COMM_WORLD.Barrier(); >>> wtime = MPI::Wtime(); >>> KSPSolve(*(data->ksp_schur_**primal_local), tmp_primal, tmp_primal); >>> FetiTimings::fetiSolve03 += (MPI::Wtime() - wtime); >>> >>> The factorization is done explicitly before with "KSPSetUp", so I can >>> measure the time for LU factorization. It also does not scale! For 64 >>> cores, I takes 0.05 seconds, for 1024 cores 1.2 seconds. In all >>> calculations, the local coarse space matrices defined on four cores have >>> exactly the same number of rows and exactly the same number of non zero >>> entries. So, from my point of view, the time should be absolutely constant. >>> >>> Thomas >>> >>> Zitat von Barry Smith : >>> >>> >>>> Are you timing ONLY the time to factor and solve the subproblems? Or >>>> also the time to get the data to the collection of 4 cores at a time? >>>> >>>> If you are only using LU for these problems and not elsewhere in the >>>> code you can get the factorization and time from MatLUFactor() and >>>> MatSolve() or you can use stages to put this calculation in its own stage >>>> and use the MatLUFactor() and MatSolve() time from that stage. >>>> Also look at the load balancing column for the factorization and solve >>>> stage, it is well balanced? >>>> >>>> Barry >>>> >>>> On Dec 20, 2012, at 2:16 PM, Thomas Witkowski < >>>> thomas.witkowski at tu-dresden.**de > wrote: >>>> >>>> In my multilevel FETI-DP code, I have localized course matrices, which >>>>> are defined on only a subset of all MPI tasks, typically >>>>> between 4 and 64 >>>>> tasks. The MatAIJ and the KSP objects are both defined on a MPI >>>>> communicator, which is a subset of MPI::COMM_WORLD. The LU factorization >>>>> of the matrices is computed with either MUMPS or superlu_dist, but both >>>>> show some scaling property I really wonder of: When the overall problem >>>>> size is increased, the solve with the LU factorization of the local >>>>> matrices does not scale! But why not? I just increase the number >>>>> of local >>>>> matrices, but all of them are independent of each other. Some example: I >>>>> use 64 cores, each coarse matrix is spanned by 4 cores so there >>>>> are 16 MPI >>>>> communicators with 16 coarse space matrices. The problem need >>>>> to solve 192 >>>>> times with the coarse space systems, and this takes together >>>>> 0.09 seconds. >>>>> Now I increase the number of cores to 256, but let the local >>>>> coarse space >>>>> be defined again on only 4 cores. Again, 192 solutions with these coarse >>>>> spaces are required, but now this takes 0.24 seconds. The same for 1024 >>>>> cores, and we are at 1.7 seconds for the local coarse space solver! >>>>> >>>>> For me, this is a total mystery! Any idea how to explain, debug and >>>>> eventually how to resolve this problem? >>>>> >>>>> Thomas >>>>> >>>> >>>> >>>> >>> >>> >> From knepley at gmail.com Thu Dec 20 19:19:45 2012 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 20 Dec 2012 20:19:45 -0500 Subject: [petsc-users] LU factorization and solution of independent matrices does not scale, why? In-Reply-To: <20121220213950.nyu4ddy1og0kkw8c@mail.zih.tu-dresden.de> References: <50D37234.2040205@tu-dresden.de> <4F2AF113-B369-42AD-95B9-3D4C1E8F5CEE@mcs.anl.gov> <20121220213950.nyu4ddy1og0kkw8c@mail.zih.tu-dresden.de> Message-ID: On Thu, Dec 20, 2012 at 3:39 PM, Thomas Witkowski wrote: > I cannot use the information from log_summary, as I have three different LU > factorizations and solve (local matrices and two hierarchies of coarse > grids). Therefore, I use the following work around to get the timing of the > solve I'm intrested in: You misunderstand how to use logging. You just put these thing in separate stages. Stages represent parts of the code over which events are aggregated. Matt > MPI::COMM_WORLD.Barrier(); > wtime = MPI::Wtime(); > KSPSolve(*(data->ksp_schur_primal_local), tmp_primal, tmp_primal); > FetiTimings::fetiSolve03 += (MPI::Wtime() - wtime); > > The factorization is done explicitly before with "KSPSetUp", so I can > measure the time for LU factorization. It also does not scale! For 64 cores, > I takes 0.05 seconds, for 1024 cores 1.2 seconds. In all calculations, the > local coarse space matrices defined on four cores have exactly the same > number of rows and exactly the same number of non zero entries. So, from my > point of view, the time should be absolutely constant. > > Thomas > > Zitat von Barry Smith : > > >> >> Are you timing ONLY the time to factor and solve the subproblems? Or >> also the time to get the data to the collection of 4 cores at a time? >> >> If you are only using LU for these problems and not elsewhere in the >> code you can get the factorization and time from MatLUFactor() and >> MatSolve() or you can use stages to put this calculation in its own stage >> and use the MatLUFactor() and MatSolve() time from that stage. >> Also look at the load balancing column for the factorization and solve >> stage, it is well balanced? >> >> Barry >> >> On Dec 20, 2012, at 2:16 PM, Thomas Witkowski >> wrote: >> >>> In my multilevel FETI-DP code, I have localized course matrices, which >>> are defined on only a subset of all MPI tasks, typically between 4 and 64 >>> tasks. The MatAIJ and the KSP objects are both defined on a MPI >>> communicator, which is a subset of MPI::COMM_WORLD. The LU factorization of >>> the matrices is computed with either MUMPS or superlu_dist, but both show >>> some scaling property I really wonder of: When the overall problem size is >>> increased, the solve with the LU factorization of the local matrices does >>> not scale! But why not? I just increase the number of local matrices, but >>> all of them are independent of each other. Some example: I use 64 cores, >>> each coarse matrix is spanned by 4 cores so there are 16 MPI communicators >>> with 16 coarse space matrices. The problem need to solve 192 times with the >>> coarse space systems, and this takes together 0.09 seconds. Now I increase >>> the number of cores to 256, but let the local coarse space be defined again >>> on only 4 cores. Again, 192 solutions with these coarse spaces are >>> required, but now this takes 0.24 seconds. The same for 1024 cores, and we >>> are at 1.7 seconds for the local coarse space solver! >>> >>> For me, this is a total mystery! Any idea how to explain, debug and >>> eventually how to resolve this problem? >>> >>> Thomas >> >> >> > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener From aldo.bonfiglioli at unibas.it Fri Dec 21 03:29:51 2012 From: aldo.bonfiglioli at unibas.it (Aldo Bonfiglioli) Date: Fri, 21 Dec 2012 10:29:51 +0100 Subject: [petsc-users] PetscKernel_A_gets_inverse_A_ Message-ID: <50D42C0F.2090301@unibas.it> Dear all, would it be possible to have a unified interface (also Fortran callable) to the PetscKernel_A_gets_inverse_A_ routines? I find them very useful within my own piece of Fortran code to solve small dense linear system (which I have to do very frequently). I have my own interface, at present, but I need to change it as needed when a new PETSc version is released. Regards, Aldo -- Dr. Aldo Bonfiglioli Associate professor of Fluid Flow Machinery Scuola di Ingegneria Universita' della Basilicata V.le dell'Ateneo lucano, 10 85100 Potenza ITALY tel:+39.0971.205203 fax:+39.0971.205215 Publications list From thomas.witkowski at tu-dresden.de Fri Dec 21 03:36:02 2012 From: thomas.witkowski at tu-dresden.de (Thomas Witkowski) Date: Fri, 21 Dec 2012 10:36:02 +0100 Subject: [petsc-users] LU factorization and solution of independent matrices does not scale, why? In-Reply-To: References: <50D37234.2040205@tu-dresden.de> <4F2AF113-B369-42AD-95B9-3D4C1E8F5CEE@mcs.anl.gov> <20121220213950.nyu4ddy1og0kkw8c@mail.zih.tu-dresden.de> Message-ID: <50D42D82.10603@tu-dresden.de> Okay, I did a similar benchmark now with PETSc's event logging: UMFPACK 16p: Local solve 350 1.0 2.3025e+01 1.1 5.00e+04 1.0 0.0e+00 0.0e+00 7.0e+02 63 0 0 0 52 63 0 0 0 51 0 64p: Local solve 350 1.0 2.3208e+01 1.1 5.00e+04 1.0 0.0e+00 0.0e+00 7.0e+02 60 0 0 0 52 60 0 0 0 51 0 256p: Local solve 350 1.0 2.3373e+01 1.1 5.00e+04 1.0 0.0e+00 0.0e+00 7.0e+02 49 0 0 0 52 49 0 0 0 51 1 MUMPS 16p: Local solve 350 1.0 4.7183e+01 1.1 5.00e+04 1.0 0.0e+00 0.0e+00 7.0e+02 75 0 0 0 52 75 0 0 0 51 0 64p: Local solve 350 1.0 7.1409e+01 1.1 5.00e+04 1.0 0.0e+00 0.0e+00 7.0e+02 78 0 0 0 52 78 0 0 0 51 0 256p: Local solve 350 1.0 2.6079e+02 1.1 5.00e+04 1.0 0.0e+00 0.0e+00 7.0e+02 82 0 0 0 52 82 0 0 0 51 0 As you see, the local solves with UMFPACK have nearly constant time with increasing number of subdomains. This is what I expect. The I replace UMFPACK by MUMPS and I see increasing time for local solves. In the last columns, UMFPACK has a decreasing value from 63 to 49, while MUMPS's column increases here from 75 to 82. What does this mean? Thomas Am 21.12.2012 02:19, schrieb Matthew Knepley: > On Thu, Dec 20, 2012 at 3:39 PM, Thomas Witkowski > wrote: >> I cannot use the information from log_summary, as I have three different LU >> factorizations and solve (local matrices and two hierarchies of coarse >> grids). Therefore, I use the following work around to get the timing of the >> solve I'm intrested in: > You misunderstand how to use logging. You just put these thing in > separate stages. Stages represent > parts of the code over which events are aggregated. > > Matt > >> MPI::COMM_WORLD.Barrier(); >> wtime = MPI::Wtime(); >> KSPSolve(*(data->ksp_schur_primal_local), tmp_primal, tmp_primal); >> FetiTimings::fetiSolve03 += (MPI::Wtime() - wtime); >> >> The factorization is done explicitly before with "KSPSetUp", so I can >> measure the time for LU factorization. It also does not scale! For 64 cores, >> I takes 0.05 seconds, for 1024 cores 1.2 seconds. In all calculations, the >> local coarse space matrices defined on four cores have exactly the same >> number of rows and exactly the same number of non zero entries. So, from my >> point of view, the time should be absolutely constant. >> >> Thomas >> >> Zitat von Barry Smith : >> >> >>> Are you timing ONLY the time to factor and solve the subproblems? Or >>> also the time to get the data to the collection of 4 cores at a time? >>> >>> If you are only using LU for these problems and not elsewhere in the >>> code you can get the factorization and time from MatLUFactor() and >>> MatSolve() or you can use stages to put this calculation in its own stage >>> and use the MatLUFactor() and MatSolve() time from that stage. >>> Also look at the load balancing column for the factorization and solve >>> stage, it is well balanced? >>> >>> Barry >>> >>> On Dec 20, 2012, at 2:16 PM, Thomas Witkowski >>> wrote: >>> >>>> In my multilevel FETI-DP code, I have localized course matrices, which >>>> are defined on only a subset of all MPI tasks, typically between 4 and 64 >>>> tasks. The MatAIJ and the KSP objects are both defined on a MPI >>>> communicator, which is a subset of MPI::COMM_WORLD. The LU factorization of >>>> the matrices is computed with either MUMPS or superlu_dist, but both show >>>> some scaling property I really wonder of: When the overall problem size is >>>> increased, the solve with the LU factorization of the local matrices does >>>> not scale! But why not? I just increase the number of local matrices, but >>>> all of them are independent of each other. Some example: I use 64 cores, >>>> each coarse matrix is spanned by 4 cores so there are 16 MPI communicators >>>> with 16 coarse space matrices. The problem need to solve 192 times with the >>>> coarse space systems, and this takes together 0.09 seconds. Now I increase >>>> the number of cores to 256, but let the local coarse space be defined again >>>> on only 4 cores. Again, 192 solutions with these coarse spaces are >>>> required, but now this takes 0.24 seconds. The same for 1024 cores, and we >>>> are at 1.7 seconds for the local coarse space solver! >>>> >>>> For me, this is a total mystery! Any idea how to explain, debug and >>>> eventually how to resolve this problem? >>>> >>>> Thomas >>> >>> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener From aldo.bonfiglioli at unibas.it Fri Dec 21 06:04:31 2012 From: aldo.bonfiglioli at unibas.it (Aldo Bonfiglioli) Date: Fri, 21 Dec 2012 13:04:31 +0100 Subject: [petsc-users] VecSetBlockSize with release 3.3 Message-ID: <50D4504F.5010105@unibas.it> Dear all, I am in the process of upgrading from 3.2 to 3.3. I am a little bit puzzled by the following change: > VecSetBlockSize() cannot be called after VecCreateSeq() or > VecCreateMPI() and must be called before VecSetUp() or > VecSetFromOptions() or before either VecSetType() or VecSetSizes() With the earlier release I used to do the following: CALL VecCreateSeq(PETSC_COMM_SELF,NPOIN*NOFVAR,DT,IFAIL) C C IF(NOFVAR.GT.1) CALL VecSetBlockSize(DT,NOFVAR,IFAIL) with 3.3 it looks like the following is required : CALL VecCreate(PETSC_COMM_SELF,DT,IFAIL) CALL VecSetType(DT,VECSEQ,IFAIL) CALL VecSetBlockSize(DT,NOFVAR,IFAIL) CALL VecSetSizes(DT,NPOIN*NOFVAR,PETSC_DECIDE,IFAIL) Is there a simpler (i.e. less library calls) way to achieve the same result? Regards, Aldo -- Dr. Aldo Bonfiglioli Associate professor of Fluid Flow Machinery Scuola di Ingegneria Universita' della Basilicata V.le dell'Ateneo lucano, 10 85100 Potenza ITALY tel:+39.0971.205203 fax:+39.0971.205215 Publications list From knepley at gmail.com Fri Dec 21 06:52:04 2012 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 21 Dec 2012 07:52:04 -0500 Subject: [petsc-users] VecSetBlockSize with release 3.3 In-Reply-To: <50D4504F.5010105@unibas.it> References: <50D4504F.5010105@unibas.it> Message-ID: On Fri, Dec 21, 2012 at 7:04 AM, Aldo Bonfiglioli wrote: > Dear all, > I am in the process of upgrading from 3.2 to 3.3. > > I am a little bit puzzled by the following change: >> VecSetBlockSize() cannot be called after VecCreateSeq() or >> VecCreateMPI() and must be called before VecSetUp() or >> VecSetFromOptions() or before either VecSetType() or VecSetSizes() > With the earlier release I used to do the following: > > CALL VecCreateSeq(PETSC_COMM_SELF,NPOIN*NOFVAR,DT,IFAIL) > C > C > IF(NOFVAR.GT.1) CALL VecSetBlockSize(DT,NOFVAR,IFAIL) > > with 3.3 it looks like the following is required : > > > CALL VecCreate(PETSC_COMM_SELF,DT,IFAIL) > CALL VecSetType(DT,VECSEQ,IFAIL) > CALL VecSetBlockSize(DT,NOFVAR,IFAIL) > CALL VecSetSizes(DT,NPOIN*NOFVAR,PETSC_DECIDE,IFAIL) > > Is there a simpler (i.e. less library calls) way to achieve the same result? No, there is a complicated set of dependencies here for setup. We discussed this and could not find an easier way to do it. Personally, I would never call SetType() in my code, only VecSetFromOptions(). Also, we call thee functions very rarely, since we almost always use VecDuplicate(), DMGetGlobal/LocalVector(), etc. Matt > Regards, > Aldo > -- > Dr. Aldo Bonfiglioli > Associate professor of Fluid Flow Machinery > Scuola di Ingegneria > Universita' della Basilicata > V.le dell'Ateneo lucano, 10 85100 Potenza ITALY > tel:+39.0971.205203 fax:+39.0971.205215 > > > Publications list -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener From jedbrown at mcs.anl.gov Fri Dec 21 08:08:14 2012 From: jedbrown at mcs.anl.gov (Jed Brown) Date: Fri, 21 Dec 2012 07:08:14 -0700 Subject: [petsc-users] LU factorization and solution of independent matrices does not scale, why? In-Reply-To: <50D42D82.10603@tu-dresden.de> References: <50D37234.2040205@tu-dresden.de> <4F2AF113-B369-42AD-95B9-3D4C1E8F5CEE@mcs.anl.gov> <20121220213950.nyu4ddy1og0kkw8c@mail.zih.tu-dresden.de> <50D42D82.10603@tu-dresden.de> Message-ID: MUMPS uses MPI_Iprobe on MPI_COMM_WORLD (hard-coded). What MPI implementation have you been using? Is the behavior different with a different implementation? On Fri, Dec 21, 2012 at 2:36 AM, Thomas Witkowski < thomas.witkowski at tu-dresden.de> wrote: > Okay, I did a similar benchmark now with PETSc's event logging: > > UMFPACK > 16p: Local solve 350 1.0 2.3025e+01 1.1 5.00e+04 1.0 0.0e+00 > 0.0e+00 7.0e+02 63 0 0 0 52 63 0 0 0 51 0 > 64p: Local solve 350 1.0 2.3208e+01 1.1 5.00e+04 1.0 0.0e+00 > 0.0e+00 7.0e+02 60 0 0 0 52 60 0 0 0 51 0 > 256p: Local solve 350 1.0 2.3373e+01 1.1 5.00e+04 1.0 0.0e+00 > 0.0e+00 7.0e+02 49 0 0 0 52 49 0 0 0 51 1 > > MUMPS > 16p: Local solve 350 1.0 4.7183e+01 1.1 5.00e+04 1.0 0.0e+00 > 0.0e+00 7.0e+02 75 0 0 0 52 75 0 0 0 51 0 > 64p: Local solve 350 1.0 7.1409e+01 1.1 5.00e+04 1.0 0.0e+00 > 0.0e+00 7.0e+02 78 0 0 0 52 78 0 0 0 51 0 > 256p: Local solve 350 1.0 2.6079e+02 1.1 5.00e+04 1.0 0.0e+00 > 0.0e+00 7.0e+02 82 0 0 0 52 82 0 0 0 51 0 > > > As you see, the local solves with UMFPACK have nearly constant time with > increasing number of subdomains. This is what I expect. The I replace > UMFPACK by MUMPS and I see increasing time for local solves. In the last > columns, UMFPACK has a decreasing value from 63 to 49, while MUMPS's column > increases here from 75 to 82. What does this mean? > > Thomas > > Am 21.12.2012 02:19, schrieb Matthew Knepley: > > On Thu, Dec 20, 2012 at 3:39 PM, Thomas Witkowski >> > >> wrote: >> >>> I cannot use the information from log_summary, as I have three different >>> LU >>> factorizations and solve (local matrices and two hierarchies of coarse >>> grids). Therefore, I use the following work around to get the timing of >>> the >>> solve I'm intrested in: >>> >> You misunderstand how to use logging. You just put these thing in >> separate stages. Stages represent >> parts of the code over which events are aggregated. >> >> Matt >> >> MPI::COMM_WORLD.Barrier(); >>> wtime = MPI::Wtime(); >>> KSPSolve(*(data->ksp_schur_**primal_local), tmp_primal, >>> tmp_primal); >>> FetiTimings::fetiSolve03 += (MPI::Wtime() - wtime); >>> >>> The factorization is done explicitly before with "KSPSetUp", so I can >>> measure the time for LU factorization. It also does not scale! For 64 >>> cores, >>> I takes 0.05 seconds, for 1024 cores 1.2 seconds. In all calculations, >>> the >>> local coarse space matrices defined on four cores have exactly the same >>> number of rows and exactly the same number of non zero entries. So, from >>> my >>> point of view, the time should be absolutely constant. >>> >>> Thomas >>> >>> Zitat von Barry Smith : >>> >>> >>> Are you timing ONLY the time to factor and solve the subproblems? Or >>>> also the time to get the data to the collection of 4 cores at a time? >>>> >>>> If you are only using LU for these problems and not elsewhere in >>>> the >>>> code you can get the factorization and time from MatLUFactor() and >>>> MatSolve() or you can use stages to put this calculation in its own >>>> stage >>>> and use the MatLUFactor() and MatSolve() time from that stage. >>>> Also look at the load balancing column for the factorization and solve >>>> stage, it is well balanced? >>>> >>>> Barry >>>> >>>> On Dec 20, 2012, at 2:16 PM, Thomas Witkowski >>>> > >>>> wrote: >>>> >>>> In my multilevel FETI-DP code, I have localized course matrices, which >>>>> are defined on only a subset of all MPI tasks, typically between 4 >>>>> and 64 >>>>> tasks. The MatAIJ and the KSP objects are both defined on a MPI >>>>> communicator, which is a subset of MPI::COMM_WORLD. The LU >>>>> factorization of >>>>> the matrices is computed with either MUMPS or superlu_dist, but both >>>>> show >>>>> some scaling property I really wonder of: When the overall problem >>>>> size is >>>>> increased, the solve with the LU factorization of the local matrices >>>>> does >>>>> not scale! But why not? I just increase the number of local matrices, >>>>> but >>>>> all of them are independent of each other. Some example: I use 64 >>>>> cores, >>>>> each coarse matrix is spanned by 4 cores so there are 16 MPI >>>>> communicators >>>>> with 16 coarse space matrices. The problem need to solve 192 times >>>>> with the >>>>> coarse space systems, and this takes together 0.09 seconds. Now I >>>>> increase >>>>> the number of cores to 256, but let the local coarse space be defined >>>>> again >>>>> on only 4 cores. Again, 192 solutions with these coarse spaces are >>>>> required, but now this takes 0.24 seconds. The same for 1024 cores, >>>>> and we >>>>> are at 1.7 seconds for the local coarse space solver! >>>>> >>>>> For me, this is a total mystery! Any idea how to explain, debug and >>>>> eventually how to resolve this problem? >>>>> >>>>> Thomas >>>>> >>>> >>>> >>>> >>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which >> their experiments lead. >> -- Norbert Wiener >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Thomas.Witkowski at tu-dresden.de Fri Dec 21 09:51:12 2012 From: Thomas.Witkowski at tu-dresden.de (Thomas Witkowski) Date: Fri, 21 Dec 2012 16:51:12 +0100 Subject: [petsc-users] LU factorization and solution of independent matrices does not scale, why? In-Reply-To: References: <50D37234.2040205@tu-dresden.de> <4F2AF113-B369-42AD-95B9-3D4C1E8F5CEE@mcs.anl.gov> <20121220213950.nyu4ddy1og0kkw8c@mail.zih.tu-dresden.de> <50D42D82.10603@tu-dresden.de> Message-ID: <20121221165112.h5x9cere68sgc488@mail.zih.tu-dresden.de> I use a modified MPICH version. On the system I use for these benchmarks I cannot use another MPI library. I'm not fixed to MUMPS. Superlu_dist, for example, works also perfectly for this. But there is still the following problem I cannot solve: When I increase the number of coarse space matrices, there seems to be no scaling direct solver for this. Just to summaries: - one coarse space matrix is created always by one "cluster" consisting of four subdomanins/MPI tasks - the four tasks are always local to one node, thus inter-node network communication is not required for computing factorization and solve - independent of the number of cluster, the coarse space matrices are the same, have the same number of rows, nnz structure but possibly different values - there is NO load unbalancing - the matrices must be factorized and there are a lot of solves (> 100) with them It should be pretty clear, that computing LU factorization and solving with it should scale perfectly. But at the moment, all direct solver I tried (mumps, superlu_dist, pastix) are not able to scale. The loos of scale is really worse, as you can see from the numbers I send before. Any ideas? Suggestions? Without a scaling solver method for these kind of systems, my multilevel FETI-DP code is just more or less a joke, only some orders of magnitude slower than standard FETI-DP method :) Thomas Zitat von Jed Brown : > MUMPS uses MPI_Iprobe on MPI_COMM_WORLD (hard-coded). What MPI > implementation have you been using? Is the behavior different with a > different implementation? > > > On Fri, Dec 21, 2012 at 2:36 AM, Thomas Witkowski < > thomas.witkowski at tu-dresden.de> wrote: > >> Okay, I did a similar benchmark now with PETSc's event logging: >> >> UMFPACK >> 16p: Local solve 350 1.0 2.3025e+01 1.1 5.00e+04 1.0 0.0e+00 >> 0.0e+00 7.0e+02 63 0 0 0 52 63 0 0 0 51 0 >> 64p: Local solve 350 1.0 2.3208e+01 1.1 5.00e+04 1.0 0.0e+00 >> 0.0e+00 7.0e+02 60 0 0 0 52 60 0 0 0 51 0 >> 256p: Local solve 350 1.0 2.3373e+01 1.1 5.00e+04 1.0 0.0e+00 >> 0.0e+00 7.0e+02 49 0 0 0 52 49 0 0 0 51 1 >> >> MUMPS >> 16p: Local solve 350 1.0 4.7183e+01 1.1 5.00e+04 1.0 0.0e+00 >> 0.0e+00 7.0e+02 75 0 0 0 52 75 0 0 0 51 0 >> 64p: Local solve 350 1.0 7.1409e+01 1.1 5.00e+04 1.0 0.0e+00 >> 0.0e+00 7.0e+02 78 0 0 0 52 78 0 0 0 51 0 >> 256p: Local solve 350 1.0 2.6079e+02 1.1 5.00e+04 1.0 0.0e+00 >> 0.0e+00 7.0e+02 82 0 0 0 52 82 0 0 0 51 0 >> >> >> As you see, the local solves with UMFPACK have nearly constant time with >> increasing number of subdomains. This is what I expect. The I replace >> UMFPACK by MUMPS and I see increasing time for local solves. In the last >> columns, UMFPACK has a decreasing value from 63 to 49, while MUMPS's column >> increases here from 75 to 82. What does this mean? >> >> Thomas >> >> Am 21.12.2012 02:19, schrieb Matthew Knepley: >> >> On Thu, Dec 20, 2012 at 3:39 PM, Thomas Witkowski >>> > >>> wrote: >>> >>>> I cannot use the information from log_summary, as I have three different >>>> LU >>>> factorizations and solve (local matrices and two hierarchies of coarse >>>> grids). Therefore, I use the following work around to get the timing of >>>> the >>>> solve I'm intrested in: >>>> >>> You misunderstand how to use logging. You just put these thing in >>> separate stages. Stages represent >>> parts of the code over which events are aggregated. >>> >>> Matt >>> >>> MPI::COMM_WORLD.Barrier(); >>>> wtime = MPI::Wtime(); >>>> KSPSolve(*(data->ksp_schur_**primal_local), tmp_primal, >>>> tmp_primal); >>>> FetiTimings::fetiSolve03 += (MPI::Wtime() - wtime); >>>> >>>> The factorization is done explicitly before with "KSPSetUp", so I can >>>> measure the time for LU factorization. It also does not scale! For 64 >>>> cores, >>>> I takes 0.05 seconds, for 1024 cores 1.2 seconds. In all calculations, >>>> the >>>> local coarse space matrices defined on four cores have exactly the same >>>> number of rows and exactly the same number of non zero entries. So, from >>>> my >>>> point of view, the time should be absolutely constant. >>>> >>>> Thomas >>>> >>>> Zitat von Barry Smith : >>>> >>>> >>>> Are you timing ONLY the time to factor and solve the subproblems? Or >>>>> also the time to get the data to the collection of 4 cores at a time? >>>>> >>>>> If you are only using LU for these problems and not elsewhere in >>>>> the >>>>> code you can get the factorization and time from MatLUFactor() and >>>>> MatSolve() or you can use stages to put this calculation in its own >>>>> stage >>>>> and use the MatLUFactor() and MatSolve() time from that stage. >>>>> Also look at the load balancing column for the factorization and solve >>>>> stage, it is well balanced? >>>>> >>>>> Barry >>>>> >>>>> On Dec 20, 2012, at 2:16 PM, Thomas Witkowski >>>>> > >>>>> wrote: >>>>> >>>>> In my multilevel FETI-DP code, I have localized course matrices, which >>>>>> are defined on only a subset of all MPI tasks, typically between 4 >>>>>> and 64 >>>>>> tasks. The MatAIJ and the KSP objects are both defined on a MPI >>>>>> communicator, which is a subset of MPI::COMM_WORLD. The LU >>>>>> factorization of >>>>>> the matrices is computed with either MUMPS or superlu_dist, but both >>>>>> show >>>>>> some scaling property I really wonder of: When the overall problem >>>>>> size is >>>>>> increased, the solve with the LU factorization of the local matrices >>>>>> does >>>>>> not scale! But why not? I just increase the number of local matrices, >>>>>> but >>>>>> all of them are independent of each other. Some example: I use 64 >>>>>> cores, >>>>>> each coarse matrix is spanned by 4 cores so there are 16 MPI >>>>>> communicators >>>>>> with 16 coarse space matrices. The problem need to solve 192 times >>>>>> with the >>>>>> coarse space systems, and this takes together 0.09 seconds. Now I >>>>>> increase >>>>>> the number of cores to 256, but let the local coarse space be defined >>>>>> again >>>>>> on only 4 cores. Again, 192 solutions with these coarse spaces are >>>>>> required, but now this takes 0.24 seconds. The same for 1024 cores, >>>>>> and we >>>>>> are at 1.7 seconds for the local coarse space solver! >>>>>> >>>>>> For me, this is a total mystery! Any idea how to explain, debug and >>>>>> eventually how to resolve this problem? >>>>>> >>>>>> Thomas >>>>>> >>>>> >>>>> >>>>> >>>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which >>> their experiments lead. >>> -- Norbert Wiener >>> >> >> > From agrayver at gfz-potsdam.de Fri Dec 21 10:00:10 2012 From: agrayver at gfz-potsdam.de (Alexander Grayver) Date: Fri, 21 Dec 2012 17:00:10 +0100 Subject: [petsc-users] LU factorization and solution of independent matrices does not scale, why? In-Reply-To: <20121221165112.h5x9cere68sgc488@mail.zih.tu-dresden.de> References: <50D37234.2040205@tu-dresden.de> <4F2AF113-B369-42AD-95B9-3D4C1E8F5CEE@mcs.anl.gov> <20121220213950.nyu4ddy1og0kkw8c@mail.zih.tu-dresden.de> <50D42D82.10603@tu-dresden.de> <20121221165112.h5x9cere68sgc488@mail.zih.tu-dresden.de> Message-ID: <50D4878A.9080004@gfz-potsdam.de> Thomas, I'm missing one point... You run N sequential factorizations (i.e. each has its own matrix to work with and no need to communicate?) independently within ONE node? Or there are N factorizations that run on N nodes? Jed, > MUMPS uses MPI_Iprobe on MPI_COMM_WORLD (hard-coded). Any reason they do it that way? Which part of the code is that (i.e. analysis/factorization/solution.)? Regards, Alexander On 21.12.2012 16:51, Thomas Witkowski wrote: > I use a modified MPICH version. On the system I use for these > benchmarks I cannot use another MPI library. > > I'm not fixed to MUMPS. Superlu_dist, for example, works also > perfectly for this. But there is still the following problem I cannot > solve: When I increase the number of coarse space matrices, there > seems to be no scaling direct solver for this. Just to summaries: > - one coarse space matrix is created always by one "cluster" > consisting of four subdomanins/MPI tasks > - the four tasks are always local to one node, thus inter-node network > communication is not required for computing factorization and solve > - independent of the number of cluster, the coarse space matrices are > the same, have the same number of rows, nnz structure but possibly > different values > - there is NO load unbalancing > - the matrices must be factorized and there are a lot of solves (> > 100) with them > > It should be pretty clear, that computing LU factorization and solving > with it should scale perfectly. But at the moment, all direct solver I > tried (mumps, superlu_dist, pastix) are not able to scale. The loos of > scale is really worse, as you can see from the numbers I send before. > > Any ideas? Suggestions? Without a scaling solver method for these kind > of systems, my multilevel FETI-DP code is just more or less a joke, > only some orders of magnitude slower than standard FETI-DP method :) > > Thomas From knepley at gmail.com Fri Dec 21 10:00:21 2012 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 21 Dec 2012 11:00:21 -0500 Subject: [petsc-users] LU factorization and solution of independent matrices does not scale, why? In-Reply-To: <20121221165112.h5x9cere68sgc488@mail.zih.tu-dresden.de> References: <50D37234.2040205@tu-dresden.de> <4F2AF113-B369-42AD-95B9-3D4C1E8F5CEE@mcs.anl.gov> <20121220213950.nyu4ddy1og0kkw8c@mail.zih.tu-dresden.de> <50D42D82.10603@tu-dresden.de> <20121221165112.h5x9cere68sgc488@mail.zih.tu-dresden.de> Message-ID: On Fri, Dec 21, 2012 at 10:51 AM, Thomas Witkowski wrote: > I use a modified MPICH version. On the system I use for these benchmarks I > cannot use another MPI library. > > I'm not fixed to MUMPS. Superlu_dist, for example, works also perfectly for > this. But there is still the following problem I cannot solve: When I > increase the number of coarse space matrices, there seems to be no scaling > direct solver for this. Just to summaries: > - one coarse space matrix is created always by one "cluster" consisting of > four subdomanins/MPI tasks > - the four tasks are always local to one node, thus inter-node network > communication is not required for computing factorization and solve > - independent of the number of cluster, the coarse space matrices are the > same, have the same number of rows, nnz structure but possibly different > values > - there is NO load unbalancing > - the matrices must be factorized and there are a lot of solves (> 100) with > them So the numbers you have below for UMFPACK are using one matrix per MPI rank instead of one matrix per 4 ranks? There seem to be two obvious sources of bugs: 1) Your parallel solver is not just using the comm with 4 ranks 2) These ranks are not clustered together on one node for that comm Matt > It should be pretty clear, that computing LU factorization and solving with > it should scale perfectly. But at the moment, all direct solver I tried > (mumps, superlu_dist, pastix) are not able to scale. The loos of scale is > really worse, as you can see from the numbers I send before. > > Any ideas? Suggestions? Without a scaling solver method for these kind of > systems, my multilevel FETI-DP code is just more or less a joke, only some > orders of magnitude slower than standard FETI-DP method :) > > Thomas > > Zitat von Jed Brown : > >> MUMPS uses MPI_Iprobe on MPI_COMM_WORLD (hard-coded). What MPI >> implementation have you been using? Is the behavior different with a >> different implementation? >> >> >> On Fri, Dec 21, 2012 at 2:36 AM, Thomas Witkowski < >> thomas.witkowski at tu-dresden.de> wrote: >> >>> Okay, I did a similar benchmark now with PETSc's event logging: >>> >>> UMFPACK >>> 16p: Local solve 350 1.0 2.3025e+01 1.1 5.00e+04 1.0 0.0e+00 >>> 0.0e+00 7.0e+02 63 0 0 0 52 63 0 0 0 51 0 >>> 64p: Local solve 350 1.0 2.3208e+01 1.1 5.00e+04 1.0 0.0e+00 >>> 0.0e+00 7.0e+02 60 0 0 0 52 60 0 0 0 51 0 >>> 256p: Local solve 350 1.0 2.3373e+01 1.1 5.00e+04 1.0 0.0e+00 >>> 0.0e+00 7.0e+02 49 0 0 0 52 49 0 0 0 51 1 >>> >>> MUMPS >>> 16p: Local solve 350 1.0 4.7183e+01 1.1 5.00e+04 1.0 0.0e+00 >>> 0.0e+00 7.0e+02 75 0 0 0 52 75 0 0 0 51 0 >>> 64p: Local solve 350 1.0 7.1409e+01 1.1 5.00e+04 1.0 0.0e+00 >>> 0.0e+00 7.0e+02 78 0 0 0 52 78 0 0 0 51 0 >>> 256p: Local solve 350 1.0 2.6079e+02 1.1 5.00e+04 1.0 0.0e+00 >>> 0.0e+00 7.0e+02 82 0 0 0 52 82 0 0 0 51 0 >>> >>> >>> As you see, the local solves with UMFPACK have nearly constant time with >>> increasing number of subdomains. This is what I expect. The I replace >>> UMFPACK by MUMPS and I see increasing time for local solves. In the last >>> columns, UMFPACK has a decreasing value from 63 to 49, while MUMPS's >>> column >>> increases here from 75 to 82. What does this mean? >>> >>> Thomas >>> >>> Am 21.12.2012 02:19, schrieb Matthew Knepley: >>> >>> On Thu, Dec 20, 2012 at 3:39 PM, Thomas Witkowski >>>> >>>> > >>>> >>>> wrote: >>>> >>>>> I cannot use the information from log_summary, as I have three >>>>> different >>>>> LU >>>>> factorizations and solve (local matrices and two hierarchies of coarse >>>>> grids). Therefore, I use the following work around to get the timing of >>>>> the >>>>> solve I'm intrested in: >>>>> >>>> You misunderstand how to use logging. You just put these thing in >>>> separate stages. Stages represent >>>> parts of the code over which events are aggregated. >>>> >>>> Matt >>>> >>>> MPI::COMM_WORLD.Barrier(); >>>>> >>>>> wtime = MPI::Wtime(); >>>>> KSPSolve(*(data->ksp_schur_**primal_local), tmp_primal, >>>>> >>>>> tmp_primal); >>>>> FetiTimings::fetiSolve03 += (MPI::Wtime() - wtime); >>>>> >>>>> The factorization is done explicitly before with "KSPSetUp", so I can >>>>> measure the time for LU factorization. It also does not scale! For 64 >>>>> cores, >>>>> I takes 0.05 seconds, for 1024 cores 1.2 seconds. In all calculations, >>>>> the >>>>> local coarse space matrices defined on four cores have exactly the same >>>>> number of rows and exactly the same number of non zero entries. So, >>>>> from >>>>> my >>>>> point of view, the time should be absolutely constant. >>>>> >>>>> Thomas >>>>> >>>>> Zitat von Barry Smith : >>>>> >>>>> >>>>> Are you timing ONLY the time to factor and solve the subproblems? >>>>> Or >>>>>> >>>>>> also the time to get the data to the collection of 4 cores at a time? >>>>>> >>>>>> If you are only using LU for these problems and not elsewhere in >>>>>> the >>>>>> code you can get the factorization and time from MatLUFactor() and >>>>>> MatSolve() or you can use stages to put this calculation in its own >>>>>> stage >>>>>> and use the MatLUFactor() and MatSolve() time from that stage. >>>>>> Also look at the load balancing column for the factorization and >>>>>> solve >>>>>> stage, it is well balanced? >>>>>> >>>>>> Barry >>>>>> >>>>>> On Dec 20, 2012, at 2:16 PM, Thomas Witkowski >>>>>> > >>>>>> >>>>>> wrote: >>>>>> >>>>>> In my multilevel FETI-DP code, I have localized course matrices, >>>>>> which >>>>>>> >>>>>>> are defined on only a subset of all MPI tasks, typically between 4 >>>>>>> and 64 >>>>>>> tasks. The MatAIJ and the KSP objects are both defined on a MPI >>>>>>> communicator, which is a subset of MPI::COMM_WORLD. The LU >>>>>>> factorization of >>>>>>> the matrices is computed with either MUMPS or superlu_dist, but both >>>>>>> show >>>>>>> some scaling property I really wonder of: When the overall problem >>>>>>> size is >>>>>>> increased, the solve with the LU factorization of the local matrices >>>>>>> does >>>>>>> not scale! But why not? I just increase the number of local >>>>>>> matrices, >>>>>>> but >>>>>>> all of them are independent of each other. Some example: I use 64 >>>>>>> cores, >>>>>>> each coarse matrix is spanned by 4 cores so there are 16 MPI >>>>>>> communicators >>>>>>> with 16 coarse space matrices. The problem need to solve 192 times >>>>>>> with the >>>>>>> coarse space systems, and this takes together 0.09 seconds. Now I >>>>>>> increase >>>>>>> the number of cores to 256, but let the local coarse space be >>>>>>> defined >>>>>>> again >>>>>>> on only 4 cores. Again, 192 solutions with these coarse spaces are >>>>>>> required, but now this takes 0.24 seconds. The same for 1024 cores, >>>>>>> and we >>>>>>> are at 1.7 seconds for the local coarse space solver! >>>>>>> >>>>>>> For me, this is a total mystery! Any idea how to explain, debug and >>>>>>> eventually how to resolve this problem? >>>>>>> >>>>>>> Thomas >>>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which >>>> their experiments lead. >>>> -- Norbert Wiener >>>> >>> >>> >> > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener From jedbrown at mcs.anl.gov Fri Dec 21 10:01:27 2012 From: jedbrown at mcs.anl.gov (Jed Brown) Date: Fri, 21 Dec 2012 09:01:27 -0700 Subject: [petsc-users] LU factorization and solution of independent matrices does not scale, why? In-Reply-To: <20121221165112.h5x9cere68sgc488@mail.zih.tu-dresden.de> References: <50D37234.2040205@tu-dresden.de> <4F2AF113-B369-42AD-95B9-3D4C1E8F5CEE@mcs.anl.gov> <20121220213950.nyu4ddy1og0kkw8c@mail.zih.tu-dresden.de> <50D42D82.10603@tu-dresden.de> <20121221165112.h5x9cere68sgc488@mail.zih.tu-dresden.de> Message-ID: Can you reproduce this in a simpler environment so that we can report it? As I understand your statement, it sounds like you could reproduce by changing src/ksp/ksp/examples/tutorials/ex10.c to create a subcomm of size 4 and the using that everywhere, then compare log_summary running on 4 cores to running on more (despite everything really being independent) It would also be worth using an MPI profiler to see if it's really spending a lot of time in MPI_Iprobe. Since SuperLU_DIST does not use MPI_Iprobe, it may be something else. On Fri, Dec 21, 2012 at 8:51 AM, Thomas Witkowski < Thomas.Witkowski at tu-dresden.de> wrote: > I use a modified MPICH version. On the system I use for these benchmarks I > cannot use another MPI library. > > I'm not fixed to MUMPS. Superlu_dist, for example, works also perfectly > for this. But there is still the following problem I cannot solve: When I > increase the number of coarse space matrices, there seems to be no scaling > direct solver for this. Just to summaries: > - one coarse space matrix is created always by one "cluster" consisting of > four subdomanins/MPI tasks > - the four tasks are always local to one node, thus inter-node network > communication is not required for computing factorization and solve > - independent of the number of cluster, the coarse space matrices are the > same, have the same number of rows, nnz structure but possibly different > values > - there is NO load unbalancing > - the matrices must be factorized and there are a lot of solves (> 100) > with them > > It should be pretty clear, that computing LU factorization and solving > with it should scale perfectly. But at the moment, all direct solver I > tried (mumps, superlu_dist, pastix) are not able to scale. The loos of > scale is really worse, as you can see from the numbers I send before. > > Any ideas? Suggestions? Without a scaling solver method for these kind of > systems, my multilevel FETI-DP code is just more or less a joke, only some > orders of magnitude slower than standard FETI-DP method :) > > Thomas > > Zitat von Jed Brown : > > MUMPS uses MPI_Iprobe on MPI_COMM_WORLD (hard-coded). What MPI >> implementation have you been using? Is the behavior different with a >> different implementation? >> >> >> On Fri, Dec 21, 2012 at 2:36 AM, Thomas Witkowski < >> thomas.witkowski at tu-dresden.de**> wrote: >> >> Okay, I did a similar benchmark now with PETSc's event logging: >>> >>> UMFPACK >>> 16p: Local solve 350 1.0 2.3025e+01 1.1 5.00e+04 1.0 0.0e+00 >>> 0.0e+00 7.0e+02 63 0 0 0 52 63 0 0 0 51 0 >>> 64p: Local solve 350 1.0 2.3208e+01 1.1 5.00e+04 1.0 0.0e+00 >>> 0.0e+00 7.0e+02 60 0 0 0 52 60 0 0 0 51 0 >>> 256p: Local solve 350 1.0 2.3373e+01 1.1 5.00e+04 1.0 0.0e+00 >>> 0.0e+00 7.0e+02 49 0 0 0 52 49 0 0 0 51 1 >>> >>> MUMPS >>> 16p: Local solve 350 1.0 4.7183e+01 1.1 5.00e+04 1.0 0.0e+00 >>> 0.0e+00 7.0e+02 75 0 0 0 52 75 0 0 0 51 0 >>> 64p: Local solve 350 1.0 7.1409e+01 1.1 5.00e+04 1.0 0.0e+00 >>> 0.0e+00 7.0e+02 78 0 0 0 52 78 0 0 0 51 0 >>> 256p: Local solve 350 1.0 2.6079e+02 1.1 5.00e+04 1.0 0.0e+00 >>> 0.0e+00 7.0e+02 82 0 0 0 52 82 0 0 0 51 0 >>> >>> >>> As you see, the local solves with UMFPACK have nearly constant time with >>> increasing number of subdomains. This is what I expect. The I replace >>> UMFPACK by MUMPS and I see increasing time for local solves. In the last >>> columns, UMFPACK has a decreasing value from 63 to 49, while MUMPS's >>> column >>> increases here from 75 to 82. What does this mean? >>> >>> Thomas >>> >>> Am 21.12.2012 02:19, schrieb Matthew Knepley: >>> >>> On Thu, Dec 20, 2012 at 3:39 PM, Thomas Witkowski >>> >>>> >>>> >> >>>> >>>> wrote: >>>> >>>> I cannot use the information from log_summary, as I have three >>>>> different >>>>> LU >>>>> factorizations and solve (local matrices and two hierarchies of coarse >>>>> grids). Therefore, I use the following work around to get the timing of >>>>> the >>>>> solve I'm intrested in: >>>>> >>>>> You misunderstand how to use logging. You just put these thing in >>>> separate stages. Stages represent >>>> parts of the code over which events are aggregated. >>>> >>>> Matt >>>> >>>> MPI::COMM_WORLD.Barrier(); >>>> >>>>> wtime = MPI::Wtime(); >>>>> KSPSolve(*(data->ksp_schur_****primal_local), tmp_primal, >>>>> >>>>> tmp_primal); >>>>> FetiTimings::fetiSolve03 += (MPI::Wtime() - wtime); >>>>> >>>>> The factorization is done explicitly before with "KSPSetUp", so I can >>>>> measure the time for LU factorization. It also does not scale! For 64 >>>>> cores, >>>>> I takes 0.05 seconds, for 1024 cores 1.2 seconds. In all calculations, >>>>> the >>>>> local coarse space matrices defined on four cores have exactly the same >>>>> number of rows and exactly the same number of non zero entries. So, >>>>> from >>>>> my >>>>> point of view, the time should be absolutely constant. >>>>> >>>>> Thomas >>>>> >>>>> Zitat von Barry Smith : >>>>> >>>>> >>>>> Are you timing ONLY the time to factor and solve the subproblems? >>>>> Or >>>>> >>>>>> also the time to get the data to the collection of 4 cores at a time? >>>>>> >>>>>> If you are only using LU for these problems and not elsewhere in >>>>>> the >>>>>> code you can get the factorization and time from MatLUFactor() and >>>>>> MatSolve() or you can use stages to put this calculation in its own >>>>>> stage >>>>>> and use the MatLUFactor() and MatSolve() time from that stage. >>>>>> Also look at the load balancing column for the factorization and >>>>>> solve >>>>>> stage, it is well balanced? >>>>>> >>>>>> Barry >>>>>> >>>>>> On Dec 20, 2012, at 2:16 PM, Thomas Witkowski >>>>>> >>>>>> >> >>>>>> >>>>>> wrote: >>>>>> >>>>>> In my multilevel FETI-DP code, I have localized course matrices, >>>>>> which >>>>>> >>>>>>> are defined on only a subset of all MPI tasks, typically between 4 >>>>>>> and 64 >>>>>>> tasks. The MatAIJ and the KSP objects are both defined on a MPI >>>>>>> communicator, which is a subset of MPI::COMM_WORLD. The LU >>>>>>> factorization of >>>>>>> the matrices is computed with either MUMPS or superlu_dist, but both >>>>>>> show >>>>>>> some scaling property I really wonder of: When the overall problem >>>>>>> size is >>>>>>> increased, the solve with the LU factorization of the local matrices >>>>>>> does >>>>>>> not scale! But why not? I just increase the number of local >>>>>>> matrices, >>>>>>> but >>>>>>> all of them are independent of each other. Some example: I use 64 >>>>>>> cores, >>>>>>> each coarse matrix is spanned by 4 cores so there are 16 MPI >>>>>>> communicators >>>>>>> with 16 coarse space matrices. The problem need to solve 192 times >>>>>>> with the >>>>>>> coarse space systems, and this takes together 0.09 seconds. Now I >>>>>>> increase >>>>>>> the number of cores to 256, but let the local coarse space be >>>>>>> defined >>>>>>> again >>>>>>> on only 4 cores. Again, 192 solutions with these coarse spaces are >>>>>>> required, but now this takes 0.24 seconds. The same for 1024 cores, >>>>>>> and we >>>>>>> are at 1.7 seconds for the local coarse space solver! >>>>>>> >>>>>>> For me, this is a total mystery! Any idea how to explain, debug and >>>>>>> eventually how to resolve this problem? >>>>>>> >>>>>>> Thomas >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which >>>> their experiments lead. >>>> -- Norbert Wiener >>>> >>>> >>> >>> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jedbrown at mcs.anl.gov Fri Dec 21 10:04:09 2012 From: jedbrown at mcs.anl.gov (Jed Brown) Date: Fri, 21 Dec 2012 09:04:09 -0700 Subject: [petsc-users] LU factorization and solution of independent matrices does not scale, why? In-Reply-To: <50D4878A.9080004@gfz-potsdam.de> References: <50D37234.2040205@tu-dresden.de> <4F2AF113-B369-42AD-95B9-3D4C1E8F5CEE@mcs.anl.gov> <20121220213950.nyu4ddy1og0kkw8c@mail.zih.tu-dresden.de> <50D42D82.10603@tu-dresden.de> <20121221165112.h5x9cere68sgc488@mail.zih.tu-dresden.de> <50D4878A.9080004@gfz-potsdam.de> Message-ID: On Fri, Dec 21, 2012 at 9:00 AM, Alexander Grayver wrote: > > MUMPS uses MPI_Iprobe on MPI_COMM_WORLD (hard-coded). > > Any reason they do it that way? Which part of the code is that (i.e. > analysis/factorization/**solution.)? > They should Iprobe on the proper communicator. I don't know if it's a mistake in MUMPS or if they were working around a historical bug in some MPI implementation. At this point, we don't have sufficiently detailed profiling to determine that this has anything to do with the strange performance degradation that Thomas is seeing. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Thomas.Witkowski at tu-dresden.de Fri Dec 21 15:05:21 2012 From: Thomas.Witkowski at tu-dresden.de (Thomas Witkowski) Date: Fri, 21 Dec 2012 22:05:21 +0100 Subject: [petsc-users] LU factorization and solution of independent matrices does not scale, why? In-Reply-To: References: <50D37234.2040205@tu-dresden.de> <4F2AF113-B369-42AD-95B9-3D4C1E8F5CEE@mcs.anl.gov> <20121220213950.nyu4ddy1og0kkw8c@mail.zih.tu-dresden.de> <50D42D82.10603@tu-dresden.de> <20121221165112.h5x9cere68sgc488@mail.zih.tu-dresden.de> Message-ID: <20121221220521.qbp4io8kws040o8g@mail.zih.tu-dresden.de> So, here it is. Just compile and run with mpiexec -np 64 ./ex10 -ksp_type preonly -pc_type lu -pc_factor_mat_solver_package superlu_dist -log_summary 64 cores: 0.09 seconds for solving 1024 cores: 2.6 seconds for solving Thomas Zitat von Jed Brown : > Can you reproduce this in a simpler environment so that we can report it? > As I understand your statement, it sounds like you could reproduce by > changing src/ksp/ksp/examples/tutorials/ex10.c to create a subcomm of size > 4 and the using that everywhere, then compare log_summary running on 4 > cores to running on more (despite everything really being independent) > > It would also be worth using an MPI profiler to see if it's really spending > a lot of time in MPI_Iprobe. Since SuperLU_DIST does not use MPI_Iprobe, it > may be something else. > > On Fri, Dec 21, 2012 at 8:51 AM, Thomas Witkowski < > Thomas.Witkowski at tu-dresden.de> wrote: > >> I use a modified MPICH version. On the system I use for these benchmarks I >> cannot use another MPI library. >> >> I'm not fixed to MUMPS. Superlu_dist, for example, works also perfectly >> for this. But there is still the following problem I cannot solve: When I >> increase the number of coarse space matrices, there seems to be no scaling >> direct solver for this. Just to summaries: >> - one coarse space matrix is created always by one "cluster" consisting of >> four subdomanins/MPI tasks >> - the four tasks are always local to one node, thus inter-node network >> communication is not required for computing factorization and solve >> - independent of the number of cluster, the coarse space matrices are the >> same, have the same number of rows, nnz structure but possibly different >> values >> - there is NO load unbalancing >> - the matrices must be factorized and there are a lot of solves (> 100) >> with them >> >> It should be pretty clear, that computing LU factorization and solving >> with it should scale perfectly. But at the moment, all direct solver I >> tried (mumps, superlu_dist, pastix) are not able to scale. The loos of >> scale is really worse, as you can see from the numbers I send before. >> >> Any ideas? Suggestions? Without a scaling solver method for these kind of >> systems, my multilevel FETI-DP code is just more or less a joke, only some >> orders of magnitude slower than standard FETI-DP method :) >> >> Thomas >> >> Zitat von Jed Brown : >> >> MUMPS uses MPI_Iprobe on MPI_COMM_WORLD (hard-coded). What MPI >>> implementation have you been using? Is the behavior different with a >>> different implementation? >>> >>> >>> On Fri, Dec 21, 2012 at 2:36 AM, Thomas Witkowski < >>> thomas.witkowski at tu-dresden.de**> wrote: >>> >>> Okay, I did a similar benchmark now with PETSc's event logging: >>>> >>>> UMFPACK >>>> 16p: Local solve 350 1.0 2.3025e+01 1.1 5.00e+04 1.0 0.0e+00 >>>> 0.0e+00 7.0e+02 63 0 0 0 52 63 0 0 0 51 0 >>>> 64p: Local solve 350 1.0 2.3208e+01 1.1 5.00e+04 1.0 0.0e+00 >>>> 0.0e+00 7.0e+02 60 0 0 0 52 60 0 0 0 51 0 >>>> 256p: Local solve 350 1.0 2.3373e+01 1.1 5.00e+04 1.0 0.0e+00 >>>> 0.0e+00 7.0e+02 49 0 0 0 52 49 0 0 0 51 1 >>>> >>>> MUMPS >>>> 16p: Local solve 350 1.0 4.7183e+01 1.1 5.00e+04 1.0 0.0e+00 >>>> 0.0e+00 7.0e+02 75 0 0 0 52 75 0 0 0 51 0 >>>> 64p: Local solve 350 1.0 7.1409e+01 1.1 5.00e+04 1.0 0.0e+00 >>>> 0.0e+00 7.0e+02 78 0 0 0 52 78 0 0 0 51 0 >>>> 256p: Local solve 350 1.0 2.6079e+02 1.1 5.00e+04 1.0 0.0e+00 >>>> 0.0e+00 7.0e+02 82 0 0 0 52 82 0 0 0 51 0 >>>> >>>> >>>> As you see, the local solves with UMFPACK have nearly constant time with >>>> increasing number of subdomains. This is what I expect. The I replace >>>> UMFPACK by MUMPS and I see increasing time for local solves. In the last >>>> columns, UMFPACK has a decreasing value from 63 to 49, while MUMPS's >>>> column >>>> increases here from 75 to 82. What does this mean? >>>> >>>> Thomas >>>> >>>> Am 21.12.2012 02:19, schrieb Matthew Knepley: >>>> >>>> On Thu, Dec 20, 2012 at 3:39 PM, Thomas Witkowski >>>> >>>>> >>>> >>>>> >> >>>>> >>>>> wrote: >>>>> >>>>> I cannot use the information from log_summary, as I have three >>>>>> different >>>>>> LU >>>>>> factorizations and solve (local matrices and two hierarchies of coarse >>>>>> grids). Therefore, I use the following work around to get the timing of >>>>>> the >>>>>> solve I'm intrested in: >>>>>> >>>>>> You misunderstand how to use logging. You just put these thing in >>>>> separate stages. Stages represent >>>>> parts of the code over which events are aggregated. >>>>> >>>>> Matt >>>>> >>>>> MPI::COMM_WORLD.Barrier(); >>>>> >>>>>> wtime = MPI::Wtime(); >>>>>> KSPSolve(*(data->ksp_schur_****primal_local), tmp_primal, >>>>>> >>>>>> tmp_primal); >>>>>> FetiTimings::fetiSolve03 += (MPI::Wtime() - wtime); >>>>>> >>>>>> The factorization is done explicitly before with "KSPSetUp", so I can >>>>>> measure the time for LU factorization. It also does not scale! For 64 >>>>>> cores, >>>>>> I takes 0.05 seconds, for 1024 cores 1.2 seconds. In all calculations, >>>>>> the >>>>>> local coarse space matrices defined on four cores have exactly the same >>>>>> number of rows and exactly the same number of non zero entries. So, >>>>>> from >>>>>> my >>>>>> point of view, the time should be absolutely constant. >>>>>> >>>>>> Thomas >>>>>> >>>>>> Zitat von Barry Smith : >>>>>> >>>>>> >>>>>> Are you timing ONLY the time to factor and solve the subproblems? >>>>>> Or >>>>>> >>>>>>> also the time to get the data to the collection of 4 cores at a time? >>>>>>> >>>>>>> If you are only using LU for these problems and not elsewhere in >>>>>>> the >>>>>>> code you can get the factorization and time from MatLUFactor() and >>>>>>> MatSolve() or you can use stages to put this calculation in its own >>>>>>> stage >>>>>>> and use the MatLUFactor() and MatSolve() time from that stage. >>>>>>> Also look at the load balancing column for the factorization and >>>>>>> solve >>>>>>> stage, it is well balanced? >>>>>>> >>>>>>> Barry >>>>>>> >>>>>>> On Dec 20, 2012, at 2:16 PM, Thomas Witkowski >>>>>>> >>>>>> >>>>>>> >> >>>>>>> >>>>>>> wrote: >>>>>>> >>>>>>> In my multilevel FETI-DP code, I have localized course matrices, >>>>>>> which >>>>>>> >>>>>>>> are defined on only a subset of all MPI tasks, typically between 4 >>>>>>>> and 64 >>>>>>>> tasks. The MatAIJ and the KSP objects are both defined on a MPI >>>>>>>> communicator, which is a subset of MPI::COMM_WORLD. The LU >>>>>>>> factorization of >>>>>>>> the matrices is computed with either MUMPS or superlu_dist, but both >>>>>>>> show >>>>>>>> some scaling property I really wonder of: When the overall problem >>>>>>>> size is >>>>>>>> increased, the solve with the LU factorization of the local matrices >>>>>>>> does >>>>>>>> not scale! But why not? I just increase the number of local >>>>>>>> matrices, >>>>>>>> but >>>>>>>> all of them are independent of each other. Some example: I use 64 >>>>>>>> cores, >>>>>>>> each coarse matrix is spanned by 4 cores so there are 16 MPI >>>>>>>> communicators >>>>>>>> with 16 coarse space matrices. The problem need to solve 192 times >>>>>>>> with the >>>>>>>> coarse space systems, and this takes together 0.09 seconds. Now I >>>>>>>> increase >>>>>>>> the number of cores to 256, but let the local coarse space be >>>>>>>> defined >>>>>>>> again >>>>>>>> on only 4 cores. Again, 192 solutions with these coarse spaces are >>>>>>>> required, but now this takes 0.24 seconds. The same for 1024 cores, >>>>>>>> and we >>>>>>>> are at 1.7 seconds for the local coarse space solver! >>>>>>>> >>>>>>>> For me, this is a total mystery! Any idea how to explain, debug and >>>>>>>> eventually how to resolve this problem? >>>>>>>> >>>>>>>> Thomas >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their >>>>> experiments is infinitely more interesting than any results to which >>>>> their experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> >>>> >>>> >>> >> >> > -------------- next part -------------- A non-text attachment was scrubbed... Name: ex10.c Type: text/x-c++src Size: 3496 bytes Desc: not available URL: -------------- next part -------------- {P?                   ??????6??@a???o??????/a>?|7M??? 7??.????? ??P??[??'?#>????B?>?" ??B??E(@I???????????\?l????Y|? ???2?P6;f??@a???f??f ???>???R?Nd? Y Lr???? ???????&??>?H??U?>??/.?&>?w?~(l???K?@D????U????Y|? M??????>?*^?7??#?? ??>? 3??]>?T?:???>?u???h???H?????4gR???????]#?>? ????>>? 3??? Y L$??? ??X???.??????????z??????3?>??>??? #X>???^???m?? ?????F.????? ?n??#?? ??>???R?N$? 7??.?z??;???E??)?j?-???????????`[e{>??g????>?~???2????????>?*^?6???f ?? >?|7M????MpmU??;???k???.??>?W??c^?>?9?Zl???????????? ??S?(???>????z???????]% >?w?~(ln??????????S?(??>?~???2K????F.?????D~L????7?@???????1???=A$?? >?^`??w?>?d?t?E????h?M?l???}!}????D? ??>? ????4??K?@D???\?l?????? w???????e?>??.??0?>?Y@ #T??>???$????z????????????? ?%???]????>??-?n?Z??????f>?a/??y???F??a6 ??>?iVj????D? ?????'=Y&???????H?? >?H??V?>?" ?????????>??`[e|6>???]???^ G?????e????s??d"???2???>??u?rC>??x?????=A$???>?Y@ "?>?a/??zA??4gR?*>??/.?/??E(@I?b??g_?]`???_=2?0k?? ?x??>??u?q#??2s?8?L>?? x8S?>?^`??xE??>???$:??F??a6B?????=>??g???????m?? ???c??Dt ?? ?=?a??_>??(>??x??z>?? x8S???2t_??(>?d?t?F???F?5zV??>?iVk>?T?:??????? ?&??[??'??>?W??c^??????@????X?>??C?? ?>??06?????^ G?????g_?]a ??c??DtE????D~&???? w?????]???>?u???l?????&??>????B?u>??C?? ????????x>??L?=??e????l??_=2?/J?? ?=?`?????7?;?????e?>??-?n??>?9?ZZ????c??%>??? #1>??06???>??L?F???? ?#.??d"*?? ?x????_>??&???????,(>??.??1[??????0 From gokhalen at gmail.com Fri Dec 21 15:16:29 2012 From: gokhalen at gmail.com (Nachiket Gokhale) Date: Fri, 21 Dec 2012 16:16:29 -0500 Subject: [petsc-users] getting a sub matrix from a matrix Message-ID: I have a dense matrix A (100x100) and I want to extract a matrix B from it consisting of the first N columns of A. Is there a better way to do it than getting the column using MatGetColumnVector, followed by VecGetArray, and MatSetValues? It could also be done using MatGetSubMatrix but is seems to be more involved. Thanks, -Nachiket -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Fri Dec 21 15:34:25 2012 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 21 Dec 2012 15:34:25 -0600 Subject: [petsc-users] getting a sub matrix from a matrix In-Reply-To: References: Message-ID: <609E3A9E-766B-4A02-AF6C-0E4D39CBF0FC@mcs.anl.gov> On Dec 21, 2012, at 3:16 PM, Nachiket Gokhale wrote: > I have a dense matrix A (100x100) and I want to extract a matrix B from it consisting of the first N columns of A. Is there a better way to do it than getting the column using MatGetColumnVector, followed by VecGetArray, and MatSetValues? It could also be done using MatGetSubMatrix but is seems to be more involved. MatGetSubMatrix() is exactly for this purpose and should not be particularly involved. Use ISCreateStride() to create an IS to indicate all the rows and and another ISCreateStride to indicate the 0 to N-1 columns. Barry > > Thanks, > > -Nachiket From s_g at berkeley.edu Sun Dec 23 18:48:36 2012 From: s_g at berkeley.edu (Sanjay Govindjee) Date: Sun, 23 Dec 2012 16:48:36 -0800 Subject: [petsc-users] Using superlu_dist in a direct solve Message-ID: <50D7A664.6080802@berkeley.edu> I wanted to use SuperLU Dist to perform a direct solve but seem to be encountering a problem. I was wonder if this is a know issue and if there is a solution for it. The problem is easily observed using ex6.c in src/ksp/ksp/examples/tests. Out of the box: make runex6 produces a residual error of O(1e-11), all is well. I then changed the run to run on two processors and add the flag -pc_factor_mat_solver_package spooles this produces a residual error of O(1e-11), all is still well. I then switch over to -pc_factor_mat_solver_package superlu_dist and the residual error comes back as 22.6637! Something seems very wrong. My build is perfectly vanilla: export PETSC_DIR=/Users/sg/petsc-3.3-p5/ export PETSC_ARCH=intel ./configure --with-cc=icc --with-fc=ifort \ -download-{spooles,parmetis,superlu_dist,prometheus,mpich,ml,hypre,metis} make PETSC_DIR=/Users/sg/petsc-3.3-p5/ PETSC_ARCH=intel all make PETSC_DIR=/Users/sg/petsc-3.3-p5/ PETSC_ARCH=intel test -sanjay -------------- next part -------------- An HTML attachment was scrubbed... URL: From jedbrown at mcs.anl.gov Sun Dec 23 18:56:36 2012 From: jedbrown at mcs.anl.gov (Jed Brown) Date: Sun, 23 Dec 2012 18:56:36 -0600 Subject: [petsc-users] Using superlu_dist in a direct solve In-Reply-To: <50D7A664.6080802@berkeley.edu> References: <50D7A664.6080802@berkeley.edu> Message-ID: Where is your matrix? It might be ending up with a very bad pivot. If the problem can be reproduced, it should be reported to the SuperLU_DIST developers to fix. (Note that we do not see this with other matrices.) You can also try MUMPS. On Sun, Dec 23, 2012 at 6:48 PM, Sanjay Govindjee wrote: > I wanted to use SuperLU Dist to perform a direct solve but seem to be > encountering > a problem. I was wonder if this is a know issue and if there is a > solution for it. > > The problem is easily observed using ex6.c in src/ksp/ksp/examples/tests. > > Out of the box: make runex6 produces a residual error of O(1e-11), all is > well. > > I then changed the run to run on two processors and add the flag > -pc_factor_mat_solver_package spooles this produces a residual error of > O(1e-11), all is still well. > > I then switch over to -pc_factor_mat_solver_package superlu_dist and the > residual error comes back as 22.6637! Something seems very wrong. > > My build is perfectly vanilla: > > export PETSC_DIR=/Users/sg/petsc-3.3-p5/ > export PETSC_ARCH=intel > > ./configure --with-cc=icc --with-fc=ifort \ > -download-{spooles,parmetis,superlu_dist,prometheus,mpich,ml,hypre,metis} > > make PETSC_DIR=/Users/sg/petsc-3.3-p5/ PETSC_ARCH=intel all > make PETSC_DIR=/Users/sg/petsc-3.3-p5/ PETSC_ARCH=intel test > > -sanjay > -------------- next part -------------- An HTML attachment was scrubbed... URL: From s_g at berkeley.edu Sun Dec 23 19:08:37 2012 From: s_g at berkeley.edu (Sanjay Govindjee) Date: Sun, 23 Dec 2012 17:08:37 -0800 Subject: [petsc-users] Using superlu_dist in a direct solve In-Reply-To: References: <50D7A664.6080802@berkeley.edu> Message-ID: <50D7AB15.5040606@berkeley.edu> Not sure what you mean by where is your matrix? I am simply running ex6 in the ksp/examples/tests directory. The reason I ran this test is because I was seeing the same behavior with my finite element code (on perfectly benign problems). Is there a built-in test that you use to check that superlu_dist is working properly with petsc? i.e. something you know that works with with petsc 3.3-p5? -sanjay On 12/23/12 4:56 PM, Jed Brown wrote: > Where is your matrix? It might be ending up with a very bad pivot. If > the problem can be reproduced, it should be reported to the > SuperLU_DIST developers to fix. (Note that we do not see this with > other matrices.) You can also try MUMPS. > > > On Sun, Dec 23, 2012 at 6:48 PM, Sanjay Govindjee > wrote: > > I wanted to use SuperLU Dist to perform a direct solve but seem to > be encountering > a problem. I was wonder if this is a know issue and if there is a > solution for it. > > The problem is easily observed using ex6.c in > src/ksp/ksp/examples/tests. > > Out of the box: make runex6 produces a residual error of O(1e-11), > all is well. > > I then changed the run to run on two processors and add the flag > -pc_factor_mat_solver_package spooles this produces a residual > error of O(1e-11), all is still well. > > I then switch over to -pc_factor_mat_solver_package superlu_dist > and the > residual error comes back as 22.6637! Something seems very wrong. > > My build is perfectly vanilla: > > export PETSC_DIR=/Users/sg/petsc-3.3-p5/ > export PETSC_ARCH=intel > > ./configure --with-cc=icc --with-fc=ifort \ > -download-{spooles,parmetis,superlu_dist,prometheus,mpich,ml,hypre,metis} > > make PETSC_DIR=/Users/sg/petsc-3.3-p5/ PETSC_ARCH=intel all > make PETSC_DIR=/Users/sg/petsc-3.3-p5/ PETSC_ARCH=intel test > > -sanjay > > -- ----------------------------------------------- Sanjay Govindjee, PhD, PE Professor of Civil Engineering Vice Chair for Academic Affairs 779 Davis Hall Structural Engineering, Mechanics and Materials Department of Civil Engineering University of California Berkeley, CA 94720-1710 Voice: +1 510 642 6060 FAX: +1 510 643 5264 s_g at berkeley.edu http://www.ce.berkeley.edu/~sanjay ----------------------------------------------- New Books: Engineering Mechanics of Deformable Solids: A Presentation with Exercises http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641 http://ukcatalogue.oup.com/product/9780199651641.do http://amzn.com/0199651647 Engineering Mechanics 3 (Dynamics) http://www.springer.com/materials/mechanics/book/978-3-642-14018-1 http://amzn.com/3642140181 ----------------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Sun Dec 23 19:58:37 2012 From: knepley at gmail.com (Matthew Knepley) Date: Sun, 23 Dec 2012 20:58:37 -0500 Subject: [petsc-users] Using superlu_dist in a direct solve In-Reply-To: <50D7AB15.5040606@berkeley.edu> References: <50D7A664.6080802@berkeley.edu> <50D7AB15.5040606@berkeley.edu> Message-ID: On Sun, Dec 23, 2012 at 8:08 PM, Sanjay Govindjee wrote: > Not sure what you mean by where is your matrix? I am simply running ex6 > in the ksp/examples/tests directory. > > The reason I ran this test is because I was seeing the same behavior with > my finite element code (on perfectly benign problems). > > Is there a built-in test that you use to check that superlu_dist is > working properly with petsc? > i.e. something you know that works with with petsc 3.3-p5? > 1) Run it on a SNES ex5 (or KSP ex2), which is a nice Laplacian 2) Compare with MUMPS Matt > -sanjay > > > > On 12/23/12 4:56 PM, Jed Brown wrote: > > Where is your matrix? It might be ending up with a very bad pivot. If the > problem can be reproduced, it should be reported to the SuperLU_DIST > developers to fix. (Note that we do not see this with other matrices.) You > can also try MUMPS. > > > On Sun, Dec 23, 2012 at 6:48 PM, Sanjay Govindjee wrote: > >> I wanted to use SuperLU Dist to perform a direct solve but seem to be >> encountering >> a problem. I was wonder if this is a know issue and if there is a >> solution for it. >> >> The problem is easily observed using ex6.c in src/ksp/ksp/examples/tests. >> >> Out of the box: make runex6 produces a residual error of O(1e-11), all is >> well. >> >> I then changed the run to run on two processors and add the flag >> -pc_factor_mat_solver_package spooles this produces a residual error of >> O(1e-11), all is still well. >> >> I then switch over to -pc_factor_mat_solver_package superlu_dist and the >> residual error comes back as 22.6637! Something seems very wrong. >> >> My build is perfectly vanilla: >> >> export PETSC_DIR=/Users/sg/petsc-3.3-p5/ >> export PETSC_ARCH=intel >> >> ./configure --with-cc=icc --with-fc=ifort \ >> -download-{spooles,parmetis,superlu_dist,prometheus,mpich,ml,hypre,metis} >> >> make PETSC_DIR=/Users/sg/petsc-3.3-p5/ PETSC_ARCH=intel all >> make PETSC_DIR=/Users/sg/petsc-3.3-p5/ PETSC_ARCH=intel test >> >> -sanjay >> > > > -- > ----------------------------------------------- > Sanjay Govindjee, PhD, PE > Professor of Civil Engineering > Vice Chair for Academic Affairs > > 779 Davis Hall > Structural Engineering, Mechanics and Materials > Department of Civil Engineering > University of California > Berkeley, CA 94720-1710 > > Voice: +1 510 642 6060 > FAX: +1 510 643 5264s_g at berkeley.eduhttp://www.ce.berkeley.edu/~sanjay > ----------------------------------------------- > > New Books: > > Engineering Mechanics of Deformable > Solids: A Presentation with Exerciseshttp://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641http://ukcatalogue.oup.com/product/9780199651641.dohttp://amzn.com/0199651647 > > > Engineering Mechanics 3 (Dynamics)http://www.springer.com/materials/mechanics/book/978-3-642-14018-1http://amzn.com/3642140181 > > ----------------------------------------------- > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From s_g at berkeley.edu Sun Dec 23 20:01:35 2012 From: s_g at berkeley.edu (Sanjay Govindjee) Date: Sun, 23 Dec 2012 18:01:35 -0800 Subject: [petsc-users] Using superlu_dist in a direct solve In-Reply-To: References: <50D7A664.6080802@berkeley.edu> <50D7AB15.5040606@berkeley.edu> Message-ID: <50D7B77F.5010306@berkeley.edu> Would it be acceptable to use SPOOLES for the comparison? or is MUMPS needed (I would like to avoid re-doing my installation). On 12/23/12 5:58 PM, Matthew Knepley wrote: > On Sun, Dec 23, 2012 at 8:08 PM, Sanjay Govindjee > wrote: > > Not sure what you mean by where is your matrix? I am simply > running ex6 in the ksp/examples/tests directory. > > The reason I ran this test is because I was seeing the same > behavior with my finite element code (on perfectly benign problems). > > Is there a built-in test that you use to check that superlu_dist > is working properly with petsc? > i.e. something you know that works with with petsc 3.3-p5? > > > 1) Run it on a SNES ex5 (or KSP ex2), which is a nice Laplacian > > 2) Compare with MUMPS > > Matt > > -sanjay > > > > On 12/23/12 4:56 PM, Jed Brown wrote: >> Where is your matrix? It might be ending up with a very bad >> pivot. If the problem can be reproduced, it should be reported to >> the SuperLU_DIST developers to fix. (Note that we do not see this >> with other matrices.) You can also try MUMPS. >> >> >> On Sun, Dec 23, 2012 at 6:48 PM, Sanjay Govindjee >> > wrote: >> >> I wanted to use SuperLU Dist to perform a direct solve but >> seem to be encountering >> a problem. I was wonder if this is a know issue and if there >> is a solution for it. >> >> The problem is easily observed using ex6.c in >> src/ksp/ksp/examples/tests. >> >> Out of the box: make runex6 produces a residual error of >> O(1e-11), all is well. >> >> I then changed the run to run on two processors and add the flag >> -pc_factor_mat_solver_package spooles this produces a >> residual error of O(1e-11), all is still well. >> >> I then switch over to -pc_factor_mat_solver_package >> superlu_dist and the >> residual error comes back as 22.6637! Something seems very wrong. >> >> My build is perfectly vanilla: >> >> export PETSC_DIR=/Users/sg/petsc-3.3-p5/ >> export PETSC_ARCH=intel >> >> ./configure --with-cc=icc --with-fc=ifort \ >> -download-{spooles,parmetis,superlu_dist,prometheus,mpich,ml,hypre,metis} >> >> make PETSC_DIR=/Users/sg/petsc-3.3-p5/ PETSC_ARCH=intel all >> make PETSC_DIR=/Users/sg/petsc-3.3-p5/ PETSC_ARCH=intel test >> >> -sanjay >> >> > > -- > ----------------------------------------------- > Sanjay Govindjee, PhD, PE > Professor of Civil Engineering > Vice Chair for Academic Affairs > > 779 Davis Hall > Structural Engineering, Mechanics and Materials > Department of Civil Engineering > University of California > Berkeley, CA 94720-1710 > > Voice:+1 510 642 6060 > FAX:+1 510 643 5264 > s_g at berkeley.edu > http://www.ce.berkeley.edu/~sanjay > ----------------------------------------------- > > New Books: > > Engineering Mechanics of Deformable > Solids: A Presentation with Exercises > http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641 > http://ukcatalogue.oup.com/product/9780199651641.do > http://amzn.com/0199651647 > > > Engineering Mechanics 3 (Dynamics) > http://www.springer.com/materials/mechanics/book/978-3-642-14018-1 > http://amzn.com/3642140181 > > ----------------------------------------------- > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener -- ----------------------------------------------- Sanjay Govindjee, PhD, PE Professor of Civil Engineering Vice Chair for Academic Affairs 779 Davis Hall Structural Engineering, Mechanics and Materials Department of Civil Engineering University of California Berkeley, CA 94720-1710 Voice: +1 510 642 6060 FAX: +1 510 643 5264 s_g at berkeley.edu http://www.ce.berkeley.edu/~sanjay ----------------------------------------------- New Books: Engineering Mechanics of Deformable Solids: A Presentation with Exercises http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641 http://ukcatalogue.oup.com/product/9780199651641.do http://amzn.com/0199651647 Engineering Mechanics 3 (Dynamics) http://www.springer.com/materials/mechanics/book/978-3-642-14018-1 http://amzn.com/3642140181 ----------------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: From jedbrown at mcs.anl.gov Sun Dec 23 20:07:23 2012 From: jedbrown at mcs.anl.gov (Jed Brown) Date: Sun, 23 Dec 2012 20:07:23 -0600 Subject: [petsc-users] Using superlu_dist in a direct solve In-Reply-To: <50D7AB15.5040606@berkeley.edu> References: <50D7A664.6080802@berkeley.edu> <50D7AB15.5040606@berkeley.edu> Message-ID: You didn't say what options you were running ex6 with, but with the options used for the tests, I see ~/petsc/src/ksp/ksp/examples/tests$ mpirun.hydra -n 2 ./ex6 -f ~/petsc/datafiles/matrices/arco1 -pc_type lu -pc_factor_mat_solver_package superlu_dist Number of iterations = 1 Residual norm = 2.23439e-11 You need to give precise instructions for how to reproduce the behavior you are seeing. Also, for experimenting with matrices read from files, we prefer src/ksp/ksp/examples/tutorials/ex10.c because it is better commented and has more features. On Sun, Dec 23, 2012 at 7:08 PM, Sanjay Govindjee wrote: > Not sure what you mean by where is your matrix? I am simply running ex6 > in the ksp/examples/tests directory. > > The reason I ran this test is because I was seeing the same behavior with > my finite element code (on perfectly benign problems). > > Is there a built-in test that you use to check that superlu_dist is > working properly with petsc? > i.e. something you know that works with with petsc 3.3-p5? > > -sanjay > > > > On 12/23/12 4:56 PM, Jed Brown wrote: > > Where is your matrix? It might be ending up with a very bad pivot. If the > problem can be reproduced, it should be reported to the SuperLU_DIST > developers to fix. (Note that we do not see this with other matrices.) You > can also try MUMPS. > > > On Sun, Dec 23, 2012 at 6:48 PM, Sanjay Govindjee wrote: > >> I wanted to use SuperLU Dist to perform a direct solve but seem to be >> encountering >> a problem. I was wonder if this is a know issue and if there is a >> solution for it. >> >> The problem is easily observed using ex6.c in src/ksp/ksp/examples/tests. >> >> Out of the box: make runex6 produces a residual error of O(1e-11), all is >> well. >> >> I then changed the run to run on two processors and add the flag >> -pc_factor_mat_solver_package spooles this produces a residual error of >> O(1e-11), all is still well. >> >> I then switch over to -pc_factor_mat_solver_package superlu_dist and the >> residual error comes back as 22.6637! Something seems very wrong. >> >> My build is perfectly vanilla: >> >> export PETSC_DIR=/Users/sg/petsc-3.3-p5/ >> export PETSC_ARCH=intel >> >> ./configure --with-cc=icc --with-fc=ifort \ >> -download-{spooles,parmetis,superlu_dist,prometheus,mpich,ml,hypre,metis} >> >> make PETSC_DIR=/Users/sg/petsc-3.3-p5/ PETSC_ARCH=intel all >> make PETSC_DIR=/Users/sg/petsc-3.3-p5/ PETSC_ARCH=intel test >> >> -sanjay >> > > > -- > ----------------------------------------------- > Sanjay Govindjee, PhD, PE > Professor of Civil Engineering > Vice Chair for Academic Affairs > > 779 Davis Hall > Structural Engineering, Mechanics and Materials > Department of Civil Engineering > University of California > Berkeley, CA 94720-1710 > > Voice: +1 510 642 6060 > FAX: +1 510 643 5264s_g at berkeley.eduhttp://www.ce.berkeley.edu/~sanjay > ----------------------------------------------- > > New Books: > > Engineering Mechanics of Deformable > Solids: A Presentation with Exerciseshttp://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641http://ukcatalogue.oup.com/product/9780199651641.dohttp://amzn.com/0199651647 > > > Engineering Mechanics 3 (Dynamics)http://www.springer.com/materials/mechanics/book/978-3-642-14018-1http://amzn.com/3642140181 > > ----------------------------------------------- > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From s_g at berkeley.edu Sun Dec 23 20:15:34 2012 From: s_g at berkeley.edu (Sanjay Govindjee) Date: Sun, 23 Dec 2012 18:15:34 -0800 Subject: [petsc-users] Using superlu_dist in a direct solve In-Reply-To: References: <50D7A664.6080802@berkeley.edu> <50D7AB15.5040606@berkeley.edu> Message-ID: <50D7BAC6.2050807@berkeley.edu> Sorry for the confusion. I thought I was clear. Here is the make line I was running. -@${MPIEXEC} -n 2 ./ex6 -ksp_type preonly -pc_type lu -pc_factor_mat_solver_package superlu_dist -options_left no \ -f arco1 > ex6_1.tmp 2>&1; \ if (${DIFF} output/ex6_1.out ex6_1.tmp) then true; \ else echo ${PWD} ; echo "Possible problem with with ex6_1, diffs above \n========================================="; fi; \ ${RM} -f ex6_1.tmp If you change superlu_dist to spooles it works just fine as well as any other iterative methods you care to try. The matrix arcos1 was downloaded as per the instructions in the makefile. I will try reproducing the superlu_dist error with snes/examples/tutorials/ex5 now. (fyi under snes/examples/tests/output the files ex5_1.out and ex5_2.out are missing one can not run the test out of the box). -sanjay On 12/23/12 6:07 PM, Jed Brown wrote: > You didn't say what options you were running ex6 with, but with the > options used for the tests, I see > > ~/petsc/src/ksp/ksp/examples/tests$ mpirun.hydra -n 2 ./ex6 -f > ~/petsc/datafiles/matrices/arco1 -pc_type lu > -pc_factor_mat_solver_package superlu_dist > Number of iterations = 1 > Residual norm = 2.23439e-11 > > > You need to give precise instructions for how to reproduce the > behavior you are seeing. > > Also, for experimenting with matrices read from files, we prefer > src/ksp/ksp/examples/tutorials/ex10.c because it is better commented > and has more features. > > > On Sun, Dec 23, 2012 at 7:08 PM, Sanjay Govindjee > wrote: > > Not sure what you mean by where is your matrix? I am simply > running ex6 in the ksp/examples/tests directory. > > The reason I ran this test is because I was seeing the same > behavior with my finite element code (on perfectly benign problems). > > Is there a built-in test that you use to check that superlu_dist > is working properly with petsc? > i.e. something you know that works with with petsc 3.3-p5? > > -sanjay > > > > On 12/23/12 4:56 PM, Jed Brown wrote: >> Where is your matrix? It might be ending up with a very bad >> pivot. If the problem can be reproduced, it should be reported to >> the SuperLU_DIST developers to fix. (Note that we do not see this >> with other matrices.) You can also try MUMPS. >> >> >> On Sun, Dec 23, 2012 at 6:48 PM, Sanjay Govindjee >> > wrote: >> >> I wanted to use SuperLU Dist to perform a direct solve but >> seem to be encountering >> a problem. I was wonder if this is a know issue and if there >> is a solution for it. >> >> The problem is easily observed using ex6.c in >> src/ksp/ksp/examples/tests. >> >> Out of the box: make runex6 produces a residual error of >> O(1e-11), all is well. >> >> I then changed the run to run on two processors and add the flag >> -pc_factor_mat_solver_package spooles this produces a >> residual error of O(1e-11), all is still well. >> >> I then switch over to -pc_factor_mat_solver_package >> superlu_dist and the >> residual error comes back as 22.6637! Something seems very wrong. >> >> My build is perfectly vanilla: >> >> export PETSC_DIR=/Users/sg/petsc-3.3-p5/ >> export PETSC_ARCH=intel >> >> ./configure --with-cc=icc --with-fc=ifort \ >> -download-{spooles,parmetis,superlu_dist,prometheus,mpich,ml,hypre,metis} >> >> make PETSC_DIR=/Users/sg/petsc-3.3-p5/ PETSC_ARCH=intel all >> make PETSC_DIR=/Users/sg/petsc-3.3-p5/ PETSC_ARCH=intel test >> >> -sanjay >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jedbrown at mcs.anl.gov Sun Dec 23 20:26:17 2012 From: jedbrown at mcs.anl.gov (Jed Brown) Date: Sun, 23 Dec 2012 20:26:17 -0600 Subject: [petsc-users] Using superlu_dist in a direct solve In-Reply-To: <50D7BAC6.2050807@berkeley.edu> References: <50D7A664.6080802@berkeley.edu> <50D7AB15.5040606@berkeley.edu> <50D7BAC6.2050807@berkeley.edu> Message-ID: On Sun, Dec 23, 2012 at 8:15 PM, Sanjay Govindjee wrote: > Sorry for the confusion. I thought I was clear. Here is the make line I > was running. > > > -@${MPIEXEC} -n 2 ./ex6 -ksp_type preonly -pc_type lu > -pc_factor_mat_solver_package superlu_dist -options_left no \ > -f arco1 > ex6_1.tmp 2>&1; \ > if (${DIFF} output/ex6_1.out ex6_1.tmp) then true; \ > else echo ${PWD} ; echo "Possible problem with with ex6_1, > diffs above \n========================================="; fi; \ > ${RM} -f ex6_1.tmp > > If you change superlu_dist to spooles it works just fine as well as any > other iterative methods you care to try. The matrix arcos1 was downloaded > as per the instructions in the makefile. > I cannot reproduce your problem. Do you have a build with a different compiler (like GCC)? Also, what BLAS/LAPACK is being used? (You can send configure.log to petsc-maint at mcs.anl.gov.) > I will try reproducing the superlu_dist error with > snes/examples/tutorials/ex5 now. > This is the file Matt suggested. > (fyi under snes/examples/tests/output the files ex5_1.out and ex5_2.out > are missing one can not run the test out of the box). > Heh, this has been missing since the beginning of time (revision 0). I'll add it. > > -sanjay > > > > On 12/23/12 6:07 PM, Jed Brown wrote: > > You didn't say what options you were running ex6 with, but with the > options used for the tests, I see > > ~/petsc/src/ksp/ksp/examples/tests$ mpirun.hydra -n 2 ./ex6 -f > ~/petsc/datafiles/matrices/arco1 -pc_type lu -pc_factor_mat_solver_package > superlu_dist > Number of iterations = 1 > Residual norm = 2.23439e-11 > > > You need to give precise instructions for how to reproduce the behavior > you are seeing. > > Also, for experimenting with matrices read from files, we prefer > src/ksp/ksp/examples/tutorials/ex10.c because it is better commented and > has more features. > > > On Sun, Dec 23, 2012 at 7:08 PM, Sanjay Govindjee wrote: > >> Not sure what you mean by where is your matrix? I am simply running ex6 >> in the ksp/examples/tests directory. >> >> The reason I ran this test is because I was seeing the same behavior with >> my finite element code (on perfectly benign problems). >> >> Is there a built-in test that you use to check that superlu_dist is >> working properly with petsc? >> i.e. something you know that works with with petsc 3.3-p5? >> >> -sanjay >> >> >> >> On 12/23/12 4:56 PM, Jed Brown wrote: >> >> Where is your matrix? It might be ending up with a very bad pivot. If the >> problem can be reproduced, it should be reported to the SuperLU_DIST >> developers to fix. (Note that we do not see this with other matrices.) You >> can also try MUMPS. >> >> >> On Sun, Dec 23, 2012 at 6:48 PM, Sanjay Govindjee wrote: >> >>> I wanted to use SuperLU Dist to perform a direct solve but seem to be >>> encountering >>> a problem. I was wonder if this is a know issue and if there is a >>> solution for it. >>> >>> The problem is easily observed using ex6.c in src/ksp/ksp/examples/tests. >>> >>> Out of the box: make runex6 produces a residual error of O(1e-11), all >>> is well. >>> >>> I then changed the run to run on two processors and add the flag >>> -pc_factor_mat_solver_package spooles this produces a residual error of >>> O(1e-11), all is still well. >>> >>> I then switch over to -pc_factor_mat_solver_package superlu_dist and the >>> residual error comes back as 22.6637! Something seems very wrong. >>> >>> My build is perfectly vanilla: >>> >>> export PETSC_DIR=/Users/sg/petsc-3.3-p5/ >>> export PETSC_ARCH=intel >>> >>> ./configure --with-cc=icc --with-fc=ifort \ >>> -download-{spooles,parmetis,superlu_dist,prometheus,mpich,ml,hypre,metis} >>> >>> make PETSC_DIR=/Users/sg/petsc-3.3-p5/ PETSC_ARCH=intel all >>> make PETSC_DIR=/Users/sg/petsc-3.3-p5/ PETSC_ARCH=intel test >>> >>> -sanjay >>> >> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From s_g at berkeley.edu Sun Dec 23 20:37:39 2012 From: s_g at berkeley.edu (Sanjay Govindjee) Date: Sun, 23 Dec 2012 18:37:39 -0800 Subject: [petsc-users] Using superlu_dist in a direct solve In-Reply-To: References: <50D7A664.6080802@berkeley.edu> <50D7AB15.5040606@berkeley.edu> Message-ID: <50D7BFF3.3030909@berkeley.edu> I decided to go with ksp/ksp/exampeles/tutorials/ex2.c; I was unsure how to convert the run lines for snes/examples/ex5.c to work with a direct solver as I am not versed in SNES options. Notwithstanding something strange is happening only on select examples. With ksp/ksp/exampeles/tutorials/ex2.c and the run line: -@${MPIEXEC} -n 2 ./ex2 -ksp_monitor_short -m 20 -n 20 -ksp_type preonly -pc_type lu -pc_factor_mat_solver_package superlu_dist I get good results (of the order): Norm of error 1.85464e-14 iterations 1 using both superlu_dist and spooles. My BLAS/LAPACK: -llapack -lblas (so native to my machine). If you can guide me on a run line for the snes ex5.c I can try that too. I'll also try to construct a GCC build later to see if that is an issue. -sanjay On 12/23/12 5:58 PM, Matthew Knepley wrote: > On Sun, Dec 23, 2012 at 8:08 PM, Sanjay Govindjee > wrote: > > Not sure what you mean by where is your matrix? I am simply > running ex6 in the ksp/examples/tests directory. > > The reason I ran this test is because I was seeing the same > behavior with my finite element code (on perfectly benign problems). > > Is there a built-in test that you use to check that superlu_dist > is working properly with petsc? > i.e. something you know that works with with petsc 3.3-p5? > > > 1) Run it on a SNES ex5 (or KSP ex2), which is a nice Laplacian > > 2) Compare with MUMPS > > Matt > > -sanjay > > > > On 12/23/12 4:56 PM, Jed Brown wrote: >> Where is your matrix? It might be ending up with a very bad >> pivot. If the problem can be reproduced, it should be reported to >> the SuperLU_DIST developers to fix. (Note that we do not see this >> with other matrices.) You can also try MUMPS. >> >> >> On Sun, Dec 23, 2012 at 6:48 PM, Sanjay Govindjee >> > wrote: >> >> I wanted to use SuperLU Dist to perform a direct solve but >> seem to be encountering >> a problem. I was wonder if this is a know issue and if there >> is a solution for it. >> >> The problem is easily observed using ex6.c in >> src/ksp/ksp/examples/tests. >> >> Out of the box: make runex6 produces a residual error of >> O(1e-11), all is well. >> >> I then changed the run to run on two processors and add the flag >> -pc_factor_mat_solver_package spooles this produces a >> residual error of O(1e-11), all is still well. >> >> I then switch over to -pc_factor_mat_solver_package >> superlu_dist and the >> residual error comes back as 22.6637! Something seems very wrong. >> >> My build is perfectly vanilla: >> >> export PETSC_DIR=/Users/sg/petsc-3.3-p5/ >> export PETSC_ARCH=intel >> >> ./configure --with-cc=icc --with-fc=ifort \ >> -download-{spooles,parmetis,superlu_dist,prometheus,mpich,ml,hypre,metis} >> >> make PETSC_DIR=/Users/sg/petsc-3.3-p5/ PETSC_ARCH=intel all >> make PETSC_DIR=/Users/sg/petsc-3.3-p5/ PETSC_ARCH=intel test >> >> -sanjay >> >> > > -- > ----------------------------------------------- > Sanjay Govindjee, PhD, PE > Professor of Civil Engineering > Vice Chair for Academic Affairs > > 779 Davis Hall > Structural Engineering, Mechanics and Materials > Department of Civil Engineering > University of California > Berkeley, CA 94720-1710 > > Voice:+1 510 642 6060 > FAX:+1 510 643 5264 > s_g at berkeley.edu > http://www.ce.berkeley.edu/~sanjay > ----------------------------------------------- > > New Books: > > Engineering Mechanics of Deformable > Solids: A Presentation with Exercises > http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641 > http://ukcatalogue.oup.com/product/9780199651641.do > http://amzn.com/0199651647 > > > Engineering Mechanics 3 (Dynamics) > http://www.springer.com/materials/mechanics/book/978-3-642-14018-1 > http://amzn.com/3642140181 > > ----------------------------------------------- > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Sun Dec 23 20:42:56 2012 From: knepley at gmail.com (Matthew Knepley) Date: Sun, 23 Dec 2012 21:42:56 -0500 Subject: [petsc-users] Using superlu_dist in a direct solve In-Reply-To: <50D7BFF3.3030909@berkeley.edu> References: <50D7A664.6080802@berkeley.edu> <50D7AB15.5040606@berkeley.edu> <50D7BFF3.3030909@berkeley.edu> Message-ID: On Sun, Dec 23, 2012 at 9:37 PM, Sanjay Govindjee wrote: > I decided to go with ksp/ksp/exampeles/tutorials/ex2.c; I was unsure how > to convert the run lines for snes/examples/ex5.c to work with a direct > solver as I am not versed in SNES options. > > Notwithstanding something strange is happening only on select examples. > With ksp/ksp/exampeles/tutorials/ex2.c and the run line: > > -@${MPIEXEC} -n 2 ./ex2 -ksp_monitor_short -m 20 -n 20 -ksp_type preonly > -pc_type lu -pc_factor_mat_solver_package superlu_dist > > I get good results (of the order): > > Norm of error 1.85464e-14 iterations 1 > > using both superlu_dist and spooles. > > My BLAS/LAPACK: -llapack -lblas (so native to my machine). > > If you can guide me on a run line for the snes ex5.c I can try that too. > I'll also try to construct a GCC build later to see if that is an issue. > Same line on ex5, but ex2 is good enough. However, it will not tell us anything new. Try another build. Matt > -sanjay > > > On 12/23/12 5:58 PM, Matthew Knepley wrote: > > On Sun, Dec 23, 2012 at 8:08 PM, Sanjay Govindjee wrote: > >> Not sure what you mean by where is your matrix? I am simply running ex6 >> in the ksp/examples/tests directory. >> >> The reason I ran this test is because I was seeing the same behavior with >> my finite element code (on perfectly benign problems). >> >> Is there a built-in test that you use to check that superlu_dist is >> working properly with petsc? >> i.e. something you know that works with with petsc 3.3-p5? >> > > 1) Run it on a SNES ex5 (or KSP ex2), which is a nice Laplacian > > 2) Compare with MUMPS > > Matt > > >> -sanjay >> >> >> >> On 12/23/12 4:56 PM, Jed Brown wrote: >> >> Where is your matrix? It might be ending up with a very bad pivot. If the >> problem can be reproduced, it should be reported to the SuperLU_DIST >> developers to fix. (Note that we do not see this with other matrices.) You >> can also try MUMPS. >> >> >> On Sun, Dec 23, 2012 at 6:48 PM, Sanjay Govindjee wrote: >> >>> I wanted to use SuperLU Dist to perform a direct solve but seem to be >>> encountering >>> a problem. I was wonder if this is a know issue and if there is a >>> solution for it. >>> >>> The problem is easily observed using ex6.c in src/ksp/ksp/examples/tests. >>> >>> Out of the box: make runex6 produces a residual error of O(1e-11), all >>> is well. >>> >>> I then changed the run to run on two processors and add the flag >>> -pc_factor_mat_solver_package spooles this produces a residual error of >>> O(1e-11), all is still well. >>> >>> I then switch over to -pc_factor_mat_solver_package superlu_dist and the >>> residual error comes back as 22.6637! Something seems very wrong. >>> >>> My build is perfectly vanilla: >>> >>> export PETSC_DIR=/Users/sg/petsc-3.3-p5/ >>> export PETSC_ARCH=intel >>> >>> ./configure --with-cc=icc --with-fc=ifort \ >>> -download-{spooles,parmetis,superlu_dist,prometheus,mpich,ml,hypre,metis} >>> >>> make PETSC_DIR=/Users/sg/petsc-3.3-p5/ PETSC_ARCH=intel all >>> make PETSC_DIR=/Users/sg/petsc-3.3-p5/ PETSC_ARCH=intel test >>> >>> -sanjay >>> >> >> >> -- >> ----------------------------------------------- >> Sanjay Govindjee, PhD, PE >> Professor of Civil Engineering >> Vice Chair for Academic Affairs >> >> 779 Davis Hall >> Structural Engineering, Mechanics and Materials >> Department of Civil Engineering >> University of California >> Berkeley, CA 94720-1710 >> >> Voice: +1 510 642 6060 >> FAX: +1 510 643 5264s_g at berkeley.eduhttp://www.ce.berkeley.edu/~sanjay >> ----------------------------------------------- >> >> New Books: >> >> Engineering Mechanics of Deformable >> Solids: A Presentation with Exerciseshttp://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641http://ukcatalogue.oup.com/product/9780199651641.dohttp://amzn.com/0199651647 >> >> >> Engineering Mechanics 3 (Dynamics)http://www.springer.com/materials/mechanics/book/978-3-642-14018-1http://amzn.com/3642140181 >> >> ----------------------------------------------- >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From hzhang at mcs.anl.gov Mon Dec 24 10:58:54 2012 From: hzhang at mcs.anl.gov (Hong Zhang) Date: Mon, 24 Dec 2012 10:58:54 -0600 Subject: [petsc-users] Using superlu_dist in a direct solve In-Reply-To: References: <50D7A664.6080802@berkeley.edu> <50D7AB15.5040606@berkeley.edu> <50D7BFF3.3030909@berkeley.edu> Message-ID: Sanjay, Which version of superlu_dist do you use? I configured my petsc-3.3 with '--download-superlu_dist' which installs SuperLU_DIST_3.1. Then I get petsc-3.3/src/ksp/ksp/examples/tests mpiexec -n 2 ./ex6 -ksp_type preonly -pc_type lu -pc_factor_mat_solver_package superlu_dist -options_left no -f $D/arco1 Number of iterations = 1 Residual norm = 2.00484e-11 Hong On Sun, Dec 23, 2012 at 8:42 PM, Matthew Knepley wrote: > > On Sun, Dec 23, 2012 at 9:37 PM, Sanjay Govindjee wrote: >> >> I decided to go with ksp/ksp/exampeles/tutorials/ex2.c; I was unsure how >> to convert the run lines for snes/examples/ex5.c to work with a direct >> solver as I am not versed in SNES options. >> >> Notwithstanding something strange is happening only on select examples. >> With ksp/ksp/exampeles/tutorials/ex2.c and the run line: >> >> -@${MPIEXEC} -n 2 ./ex2 -ksp_monitor_short -m 20 -n 20 -ksp_type preonly >> -pc_type lu -pc_factor_mat_solver_package superlu_dist >> >> I get good results (of the order): >> >> Norm of error 1.85464e-14 iterations 1 >> >> using both superlu_dist and spooles. >> >> My BLAS/LAPACK: -llapack -lblas (so native to my machine). >> >> If you can guide me on a run line for the snes ex5.c I can try that too. >> I'll also try to construct a GCC build later to see if that is an issue. > > > Same line on ex5, but ex2 is good enough. However, it will not tell us > anything new. Try another build. > > Matt > >> >> -sanjay >> >> >> On 12/23/12 5:58 PM, Matthew Knepley wrote: >> >> On Sun, Dec 23, 2012 at 8:08 PM, Sanjay Govindjee >> wrote: >>> >>> Not sure what you mean by where is your matrix? I am simply running ex6 >>> in the ksp/examples/tests directory. >>> >>> The reason I ran this test is because I was seeing the same behavior with >>> my finite element code (on perfectly benign problems). >>> >>> Is there a built-in test that you use to check that superlu_dist is >>> working properly with petsc? >>> i.e. something you know that works with with petsc 3.3-p5? >> >> >> 1) Run it on a SNES ex5 (or KSP ex2), which is a nice Laplacian >> >> 2) Compare with MUMPS >> >> Matt >> >>> >>> -sanjay >>> >>> >>> >>> On 12/23/12 4:56 PM, Jed Brown wrote: >>> >>> Where is your matrix? It might be ending up with a very bad pivot. If the >>> problem can be reproduced, it should be reported to the SuperLU_DIST >>> developers to fix. (Note that we do not see this with other matrices.) You >>> can also try MUMPS. >>> >>> >>> On Sun, Dec 23, 2012 at 6:48 PM, Sanjay Govindjee >>> wrote: >>>> >>>> I wanted to use SuperLU Dist to perform a direct solve but seem to be >>>> encountering >>>> a problem. I was wonder if this is a know issue and if there is a >>>> solution for it. >>>> >>>> The problem is easily observed using ex6.c in >>>> src/ksp/ksp/examples/tests. >>>> >>>> Out of the box: make runex6 produces a residual error of O(1e-11), all >>>> is well. >>>> >>>> I then changed the run to run on two processors and add the flag >>>> -pc_factor_mat_solver_package spooles this produces a residual error of >>>> O(1e-11), all is still well. >>>> >>>> I then switch over to -pc_factor_mat_solver_package superlu_dist and the >>>> residual error comes back as 22.6637! Something seems very wrong. >>>> >>>> My build is perfectly vanilla: >>>> >>>> export PETSC_DIR=/Users/sg/petsc-3.3-p5/ >>>> export PETSC_ARCH=intel >>>> >>>> ./configure --with-cc=icc --with-fc=ifort \ >>>> -download-{spooles,parmetis,superlu_dist,prometheus,mpich,ml,hypre,metis} >>>> >>>> make PETSC_DIR=/Users/sg/petsc-3.3-p5/ PETSC_ARCH=intel all >>>> make PETSC_DIR=/Users/sg/petsc-3.3-p5/ PETSC_ARCH=intel test >>>> >>>> -sanjay >>> >>> >>> >>> -- >>> ----------------------------------------------- >>> Sanjay Govindjee, PhD, PE >>> Professor of Civil Engineering >>> Vice Chair for Academic Affairs >>> >>> 779 Davis Hall >>> Structural Engineering, Mechanics and Materials >>> Department of Civil Engineering >>> University of California >>> Berkeley, CA 94720-1710 >>> >>> Voice: +1 510 642 6060 >>> FAX: +1 510 643 5264 >>> s_g at berkeley.edu >>> http://www.ce.berkeley.edu/~sanjay >>> ----------------------------------------------- >>> >>> New Books: >>> >>> Engineering Mechanics of Deformable >>> Solids: A Presentation with Exercises >>> >>> http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641 >>> http://ukcatalogue.oup.com/product/9780199651641.do >>> http://amzn.com/0199651647 >>> >>> >>> Engineering Mechanics 3 (Dynamics) >>> http://www.springer.com/materials/mechanics/book/978-3-642-14018-1 >>> http://amzn.com/3642140181 >>> >>> ----------------------------------------------- >> >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> > > > > -- > What most experimenters take for granted before they begin their experiments > is infinitely more interesting than any results to which their experiments > lead. > -- Norbert Wiener From abarua at iit.edu Wed Dec 26 01:00:50 2012 From: abarua at iit.edu (amlan barua) Date: Wed, 26 Dec 2012 01:00:50 -0600 Subject: [petsc-users] Question on TS Message-ID: Hi, Greetings to the team! I am currently using PETSc for my research. Here is a brief description of my problem and my query a) I have a set a points distributed on a 3 dimensional lattice. b) Corresponding to each point in this set, 7 odes are defined. c) Of these 7 odes, 6 are uncoupled but one is coupled to nearest neighbors. d) To integrate the odes I am using PETSc's DMDA and TS. But my application needs implicit as well as locally high order solver. I am looking for an implicit RK4 type method. Does PETSc have an IRK4 support or equivalent? e) Suppose I want to build my own implicit time stepper. Should I imitate ex2.c of SNES solver? Thanks Amlan IISER Pune, India -------------- next part -------------- An HTML attachment was scrubbed... URL: From jedbrown at mcs.anl.gov Wed Dec 26 10:24:57 2012 From: jedbrown at mcs.anl.gov (Jed Brown) Date: Wed, 26 Dec 2012 10:24:57 -0600 Subject: [petsc-users] Question on TS In-Reply-To: References: Message-ID: On Wed, Dec 26, 2012 at 1:00 AM, amlan barua wrote: > Hi, > Greetings to the team! I am currently using PETSc for my research. Here is > a brief description of my problem and my query > a) I have a set a points distributed on a 3 dimensional lattice. > b) Corresponding to each point in this set, 7 odes are defined. > c) Of these 7 odes, 6 are uncoupled but one is coupled to nearest > neighbors. > I suggest not optimizing for "missing" coupling to start with. We can do the optimization in the solver, perhaps by splitting the DMDA into the local and coupled parts. > d) To integrate the odes I am using PETSc's DMDA and TS. But my > application needs implicit as well as locally high order solver. I am > looking for an implicit RK4 type method. Does PETSc have an IRK4 support or > equivalent? > If you are happy with a diagonally implicit method, you can use TSARKIMEX (these integrators can be IMEX, but can also do any diagonally implicit method). If you want a fully implicit RK (like Gauss, Radau IIA, etc) then all stages are coupled together. Those methods are not currently implemented in PETSc, though you could implement it either as a new TS implementation (good for code reuse; you can do this outside of PETSc, but the code you write is like library code) or manually using SNES (not reusable). > e) Suppose I want to build my own implicit time stepper. Should I imitate > ex2.c of SNES solver? > Thanks > Amlan > IISER Pune, India > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Wed Dec 26 11:02:58 2012 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 26 Dec 2012 11:02:58 -0600 Subject: [petsc-users] Question on TS In-Reply-To: References: Message-ID: <598DFAB9-9075-4EB1-B1A7-26CCBE4414F1@mcs.anl.gov> On Dec 26, 2012, at 10:24 AM, Jed Brown wrote: > On Wed, Dec 26, 2012 at 1:00 AM, amlan barua wrote: > Hi, > Greetings to the team! I am currently using PETSc for my research. Here is a brief description of my problem and my query > a) I have a set a points distributed on a 3 dimensional lattice. > b) Corresponding to each point in this set, 7 odes are defined. > c) Of these 7 odes, 6 are uncoupled but one is coupled to nearest neighbors. > > I suggest not optimizing for "missing" coupling to start with. We can do the optimization in the solver, perhaps by splitting the DMDA into the local and coupled parts. I agree with Jed here. Coincidently I am working on a similar problem but with thousands of ODEs (mostly decoupled). You can use DMDASetBlockFills(), the ofill parameter to indicate exactly what fields are coupled to neighbors and which are not, this reduces the unneeded zero Jacobian entries (you can also use the dfill parameter to reduce unneeded zero entries in the 7 by 7 block). Eventually we'll use the same information to reduce the ghost point communication also. Barry > > d) To integrate the odes I am using PETSc's DMDA and TS. But my application needs implicit as well as locally high order solver. I am looking for an implicit RK4 type method. Does PETSc have an IRK4 support or equivalent? > > If you are happy with a diagonally implicit method, you can use TSARKIMEX (these integrators can be IMEX, but can also do any diagonally implicit method). > > If you want a fully implicit RK (like Gauss, Radau IIA, etc) then all stages are coupled together. Those methods are not currently implemented in PETSc, though you could implement it either as a new TS implementation (good for code reuse; you can do this outside of PETSc, but the code you write is like library code) or manually using SNES (not reusable). > > e) Suppose I want to build my own implicit time stepper. Should I imitate ex2.c of SNES solver? > Thanks > Amlan > IISER Pune, India > From z240w014 at ku.edu Wed Dec 26 12:05:29 2012 From: z240w014 at ku.edu (Zhenglun (Alan) Wei) Date: Wed, 26 Dec 2012 12:05:29 -0600 Subject: [petsc-users] A quick question on DMDACreate3d Message-ID: <50DB3C69.9060806@ku.edu> Dear folks, I have a quick question on the DMDACreate3d. In the manual, it says that the input format of this function is: PetscErrorCode DMDACreate3d(MPI_Comm comm,DMDABoundaryType bx,DMDABoundaryType by,DMDABoundaryType bz,DMDAStencilType stencil_type,PetscInt M, PetscInt N,PetscInt P,PetscInt m,PetscInt n,PetscInt p,PetscInt dof,PetscInt s,const PetscInt lx[],const PetscInt ly[],const PetscInt lz[],DM *da) Now, I'm trying to manually define the "arrays containing the number of nodes in each cell along the x, y, and z coordinates". Therefore, my focus turns to 'lx[]', 'ly[]' and 'lz[]'. I suppose that they're not simply just three integers; they may be three integer type arrays, as I guess. However, I checked all examples listed for this function. None of them teaches me how to implement this three parameters except 'PETSC_NULL'. Could you please provide me an extra example to demonstrate how to use DMDACreate3d or DMDACreate2d with non-null 'lx[]', 'ly[]' and 'lz[]'. Or, a demonstration in 1D would be a good example. Say, I have a 1D uniform mesh; the number of grid in x-direction is 300. I want to use 4 processes to evenly divide this mesh. What should I input for 'lx[]' for each process? thank you so much and Happy New Year!! :) Alan -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarua at iit.edu Wed Dec 26 12:25:28 2012 From: abarua at iit.edu (amlan barua) Date: Wed, 26 Dec 2012 12:25:28 -0600 Subject: [petsc-users] Question on TS In-Reply-To: <598DFAB9-9075-4EB1-B1A7-26CCBE4414F1@mcs.anl.gov> References: <598DFAB9-9075-4EB1-B1A7-26CCBE4414F1@mcs.anl.gov> Message-ID: Hi, Thanks to Barry and Jed. I might come back later with few other questions. Amlan On Wed, Dec 26, 2012 at 11:02 AM, Barry Smith wrote: > > On Dec 26, 2012, at 10:24 AM, Jed Brown wrote: > > > On Wed, Dec 26, 2012 at 1:00 AM, amlan barua wrote: > > Hi, > > Greetings to the team! I am currently using PETSc for my research. Here > is a brief description of my problem and my query > > a) I have a set a points distributed on a 3 dimensional lattice. > > b) Corresponding to each point in this set, 7 odes are defined. > > c) Of these 7 odes, 6 are uncoupled but one is coupled to nearest > neighbors. > > > > I suggest not optimizing for "missing" coupling to start with. We can do > the optimization in the solver, perhaps by splitting the DMDA into the > local and coupled parts. > > I agree with Jed here. Coincidently I am working on a similar problem > but with thousands of ODEs (mostly decoupled). You can use > DMDASetBlockFills(), the ofill parameter to indicate exactly what fields > are coupled to neighbors and which are not, this reduces the unneeded zero > Jacobian entries (you can also use the dfill parameter to reduce unneeded > zero entries in the 7 by 7 block). Eventually we'll use the same > information to reduce the ghost point communication also. > > Barry > > > > > d) To integrate the odes I am using PETSc's DMDA and TS. But my > application needs implicit as well as locally high order solver. I am > looking for an implicit RK4 type method. Does PETSc have an IRK4 support or > equivalent? > > > > If you are happy with a diagonally implicit method, you can use > TSARKIMEX (these integrators can be IMEX, but can also do any diagonally > implicit method). > > > > If you want a fully implicit RK (like Gauss, Radau IIA, etc) then all > stages are coupled together. Those methods are not currently implemented in > PETSc, though you could implement it either as a new TS implementation > (good for code reuse; you can do this outside of PETSc, but the code you > write is like library code) or manually using SNES (not reusable). > > > > e) Suppose I want to build my own implicit time stepper. Should I > imitate ex2.c of SNES solver? > > Thanks > > Amlan > > IISER Pune, India > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Wed Dec 26 12:27:50 2012 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 26 Dec 2012 12:27:50 -0600 Subject: [petsc-users] A quick question on DMDACreate3d In-Reply-To: <50DB3C69.9060806@ku.edu> References: <50DB3C69.9060806@ku.edu> Message-ID: <3881B9C6-0989-4CE2-8A92-8BC297DC1E2F@mcs.anl.gov> On Dec 26, 2012, at 12:05 PM, "Zhenglun (Alan) Wei" wrote: > Dear folks, > I have a quick question on the DMDACreate3d. > In the manual, it says that the input format of this function is: > PetscErrorCode DMDACreate3d(MPI_Comm comm,DMDABoundaryType bx,DMDABoundaryType by,DMDABoundaryType bz,DMDAStencilType stencil_type,PetscInt M, > PetscInt N,PetscInt P,PetscInt m,PetscInt n,PetscInt p,PetscInt dof,PetscInt s,const PetscInt lx[],const PetscInt ly[],const PetscInt lz[],DM *da) > > > Now, I'm trying to manually define the "arrays containing the number of nodes in each cell along the x, y, and z coordinates". Therefore, my focus turns to 'lx[]', 'ly[]' and 'lz[]'. I suppose that they're not simply just three integers; they may be three integer type arrays, as I guess. However, I checked all examples listed for this function. None of them teaches me how to implement this three parameters except 'PETSC_NULL'. Could you please provide me an extra example to demonstrate how to use DMDACreate3d or DMDACreate2d with non-null 'lx[]', 'ly[]' and 'lz[]'. > Or, a demonstration in 1D would be a good example. Say, I have a 1D uniform mesh; the number of grid in x-direction is 300. I want to use 4 processes to evenly divide this mesh. What should I input for 'lx[]' for each process? If you use lx of PETSC_NULL it will default to putting 75 points on each process. Manually you would declare lx[4] and set lx[0] = lx[1] = lx[2] = lx[3] =75. Note that all processes need to provide the exact same values in lx, ly and lz > > thank you so much and Happy New Year!! :) > Alan > > > From jedbrown at mcs.anl.gov Wed Dec 26 12:28:43 2012 From: jedbrown at mcs.anl.gov (Jed Brown) Date: Wed, 26 Dec 2012 12:28:43 -0600 Subject: [petsc-users] A quick question on DMDACreate3d In-Reply-To: <50DB3C69.9060806@ku.edu> References: <50DB3C69.9060806@ku.edu> Message-ID: On Wed, Dec 26, 2012 at 12:05 PM, Zhenglun (Alan) Wei wrote: > Dear folks, > I have a quick question on the DMDACreate3d. > In the manual, it says that the input format of this function is: > > PetscErrorCode DMDACreate3d(MPI_Comm comm,DMDABoundaryType bx,DMDABoundaryType by,DMDABoundaryType bz,DMDAStencilType stencil_type,PetscInt M, > PetscInt N,PetscInt P,PetscInt m,PetscInt n,PetscInt p,PetscInt dof,PetscInt s,const PetscInt lx[],const PetscInt ly[],const PetscInt lz[],DM *da) > > > Now, I'm trying to manually define the "arrays containing the number > of nodes in each cell along the x, y, and z coordinates". Therefore, my > focus turns to 'lx[]', 'ly[]' and 'lz[]'. I suppose that they're not simply > just three integers; they may be three integer type arrays, as I guess. > However, I checked all examples listed for this function. None of them > teaches me how to implement this three parameters except 'PETSC_NULL'. > Could you please provide me an extra example to demonstrate how to use > DMDACreate3d or DMDACreate2d with non-null 'lx[]', 'ly[]' and 'lz[]'. > It is used by snes/examples/tutorials/ex28.c in the 1D case to ensure that the staggered grid has a specific compatible layout. As the docs say, these are arrays of length m,n,p and must sum to M, N, and P. Or, a demonstration in 1D would be a good example. Say, I have a 1D > uniform mesh; the number of grid in x-direction is 300. I want to use 4 > processes to evenly divide this mesh. What should I input for 'lx[]' for > each process? > > The defaults do this. -------------- next part -------------- An HTML attachment was scrubbed... URL: From s_g at berkeley.edu Wed Dec 26 15:13:54 2012 From: s_g at berkeley.edu (Sanjay Govindjee) Date: Wed, 26 Dec 2012 13:13:54 -0800 Subject: [petsc-users] Using superlu_dist in a direct solve In-Reply-To: References: <50D7A664.6080802@berkeley.edu> <50D7AB15.5040606@berkeley.edu> <50D7BFF3.3030909@berkeley.edu> Message-ID: <50DB6892.5040402@berkeley.edu> I have done some more testing of the problem, continuing with src/ksp/ksp/examples/tutorials/ex2.c. The behavior I am seeing is that with smaller problems sizes superlu_dist is behaving properly but with larger problem sizes things seem to go wrong and what goes wrong is apparently consistent; the error appears both with my intel build as well as with my gcc build. I have two run lines: runex2superlu: -@${MPIEXEC} -n 2 ./ex2 -ksp_monitor_short -m 100 -n 100 -ksp_type preonly -pc_type lu -pc_factor_mat_solver_package superlu_dist runex2spooles: -@${MPIEXEC} -n 2 ./ex2 -ksp_monitor_short -m 100 -n 100 -ksp_type preonly -pc_type lu -pc_factor_mat_solver_package spooles From my intel build, I get sg-macbook-prolocal:tutorials sg$ make runex2superlu Norm of error 7.66145e-13 iterations 1 sg-macbook-prolocal:tutorials sg$ make runex2spooles Norm of error 2.21422e-12 iterations 1 From my GCC build, I get sg-macbook-prolocal:tutorials sg$ make runex2superlu Norm of error 7.66145e-13 iterations 1 sg-macbook-prolocal:tutorials sg$ make runex2spooles Norm of error 2.21422e-12 iterations 1 If I change the -m 100 -n 100 to -m 500 -n 500, I get for my intel build sg-macbook-prolocal:tutorials sg$ make runex2superlu Norm of error 419.953 iterations 1 sg-macbook-prolocal:tutorials sg$ make runex2spooles Norm of error 2.69468e-10 iterations 1 From my GCC build with -m 500 -n 500, I get sg-macbook-prolocal:tutorials sg$ make runex2superlu Norm of error 419.953 iterations 1 sg-macbook-prolocal:tutorials sg$ make runex2spooles Norm of error 2.69468e-10 iterations 1 Any suggestions will be greatly appreciated. -sanjay On 12/23/12 6:42 PM, Matthew Knepley wrote: > > On Sun, Dec 23, 2012 at 9:37 PM, Sanjay Govindjee > wrote: > > I decided to go with ksp/ksp/exampeles/tutorials/ex2.c; I was > unsure how to convert the run lines for snes/examples/ex5.c to > work with a direct solver as I am not versed in SNES options. > > Notwithstanding something strange is happening only on select > examples. With ksp/ksp/exampeles/tutorials/ex2.c and the run line: > > -@${MPIEXEC} -n 2 ./ex2 -ksp_monitor_short -m 20 -n 20 -ksp_type > preonly -pc_type lu -pc_factor_mat_solver_package superlu_dist > > I get good results (of the order): > > Norm of error 1.85464e-14 iterations 1 > > using both superlu_dist and spooles. > > My BLAS/LAPACK: -llapack -lblas (so native to my machine). > > If you can guide me on a run line for the snes ex5.c I can try > that too. I'll also try to construct a GCC build later to see if > that is an issue. > > > Same line on ex5, but ex2 is good enough. However, it will not tell us > anything new. Try another build. > > Matt > > -sanjay > > > On 12/23/12 5:58 PM, Matthew Knepley wrote: >> On Sun, Dec 23, 2012 at 8:08 PM, Sanjay Govindjee >> > wrote: >> >> Not sure what you mean by where is your matrix? I am simply >> running ex6 in the ksp/examples/tests directory. >> >> The reason I ran this test is because I was seeing the same >> behavior with my finite element code (on perfectly benign >> problems). >> >> Is there a built-in test that you use to check that >> superlu_dist is working properly with petsc? >> i.e. something you know that works with with petsc 3.3-p5? >> >> >> 1) Run it on a SNES ex5 (or KSP ex2), which is a nice Laplacian >> >> 2) Compare with MUMPS >> >> Matt >> >> -sanjay >> >> >> >> On 12/23/12 4:56 PM, Jed Brown wrote: >>> Where is your matrix? It might be ending up with a very bad >>> pivot. If the problem can be reproduced, it should be >>> reported to the SuperLU_DIST developers to fix. (Note that >>> we do not see this with other matrices.) You can also try MUMPS. >>> >>> >>> On Sun, Dec 23, 2012 at 6:48 PM, Sanjay Govindjee >>> > wrote: >>> >>> I wanted to use SuperLU Dist to perform a direct solve >>> but seem to be encountering >>> a problem. I was wonder if this is a know issue and if >>> there is a solution for it. >>> >>> The problem is easily observed using ex6.c in >>> src/ksp/ksp/examples/tests. >>> >>> Out of the box: make runex6 produces a residual error of >>> O(1e-11), all is well. >>> >>> I then changed the run to run on two processors and add >>> the flag >>> -pc_factor_mat_solver_package spooles this produces a >>> residual error of O(1e-11), all is still well. >>> >>> I then switch over to -pc_factor_mat_solver_package >>> superlu_dist and the >>> residual error comes back as 22.6637! Something seems >>> very wrong. >>> >>> My build is perfectly vanilla: >>> >>> export PETSC_DIR=/Users/sg/petsc-3.3-p5/ >>> export PETSC_ARCH=intel >>> >>> ./configure --with-cc=icc --with-fc=ifort \ >>> -download-{spooles,parmetis,superlu_dist,prometheus,mpich,ml,hypre,metis} >>> >>> make PETSC_DIR=/Users/sg/petsc-3.3-p5/ PETSC_ARCH=intel all >>> make PETSC_DIR=/Users/sg/petsc-3.3-p5/ PETSC_ARCH=intel test >>> >>> -sanjay >>> >>> >> >> -- >> ----------------------------------------------- >> Sanjay Govindjee, PhD, PE >> Professor of Civil Engineering >> Vice Chair for Academic Affairs >> >> 779 Davis Hall >> Structural Engineering, Mechanics and Materials >> Department of Civil Engineering >> University of California >> Berkeley, CA 94720-1710 >> >> Voice:+1 510 642 6060 >> FAX:+1 510 643 5264 >> s_g at berkeley.edu >> http://www.ce.berkeley.edu/~sanjay >> ----------------------------------------------- >> >> New Books: >> >> Engineering Mechanics of Deformable >> Solids: A Presentation with Exercises >> http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641 >> http://ukcatalogue.oup.com/product/9780199651641.do >> http://amzn.com/0199651647 >> >> >> Engineering Mechanics 3 (Dynamics) >> http://www.springer.com/materials/mechanics/book/978-3-642-14018-1 >> http://amzn.com/3642140181 >> >> ----------------------------------------------- >> >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to >> which their experiments lead. >> -- Norbert Wiener > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From hzhang at mcs.anl.gov Wed Dec 26 15:23:33 2012 From: hzhang at mcs.anl.gov (Hong Zhang) Date: Wed, 26 Dec 2012 15:23:33 -0600 Subject: [petsc-users] Using superlu_dist in a direct solve In-Reply-To: <50DB6892.5040402@berkeley.edu> References: <50D7A664.6080802@berkeley.edu> <50D7AB15.5040606@berkeley.edu> <50D7BFF3.3030909@berkeley.edu> <50DB6892.5040402@berkeley.edu> Message-ID: Sanjay: I get petsc-3.3/src/ksp/ksp/examples/tutorials>mpiexec -n 2 ./ex2 -ksp_monitor_short -ksp_type preonly -pc_type lu -pc_factor_mat_solver_package superlu_dist -m 500 -n 500 Norm of error 1.92279e-11 iterations 1 Hong > I have done some more testing of the problem, continuing with > src/ksp/ksp/examples/tutorials/ex2.c. > > The behavior I am seeing is that with smaller problems sizes superlu_dist is > behaving properly > but with larger problem sizes things seem to go wrong and what goes wrong is > apparently consistent; the error appears both with my intel build as well as > with my gcc build. > > I have two run lines: > > runex2superlu: > -@${MPIEXEC} -n 2 ./ex2 -ksp_monitor_short -m 100 -n 100 -ksp_type > preonly -pc_type lu -pc_factor_mat_solver_package superlu_dist > > runex2spooles: > -@${MPIEXEC} -n 2 ./ex2 -ksp_monitor_short -m 100 -n 100 -ksp_type > preonly -pc_type lu -pc_factor_mat_solver_package spooles > > From my intel build, I get > > sg-macbook-prolocal:tutorials sg$ make runex2superlu > Norm of error 7.66145e-13 iterations 1 > sg-macbook-prolocal:tutorials sg$ make runex2spooles > Norm of error 2.21422e-12 iterations 1 > > From my GCC build, I get > sg-macbook-prolocal:tutorials sg$ make runex2superlu > Norm of error 7.66145e-13 iterations 1 > sg-macbook-prolocal:tutorials sg$ make runex2spooles > Norm of error 2.21422e-12 iterations 1 > > If I change the -m 100 -n 100 to -m 500 -n 500, I get for my intel build > > sg-macbook-prolocal:tutorials sg$ make runex2superlu > Norm of error 419.953 iterations 1 > sg-macbook-prolocal:tutorials sg$ make runex2spooles > Norm of error 2.69468e-10 iterations 1 > > From my GCC build with -m 500 -n 500, I get > > sg-macbook-prolocal:tutorials sg$ make runex2superlu > Norm of error 419.953 iterations 1 > sg-macbook-prolocal:tutorials sg$ make runex2spooles > Norm of error 2.69468e-10 iterations 1 > > > Any suggestions will be greatly appreciated. > > -sanjay > > > > > > > > On 12/23/12 6:42 PM, Matthew Knepley wrote: > > > On Sun, Dec 23, 2012 at 9:37 PM, Sanjay Govindjee wrote: >> >> I decided to go with ksp/ksp/exampeles/tutorials/ex2.c; I was unsure how >> to convert the run lines for snes/examples/ex5.c to work with a direct >> solver as I am not versed in SNES options. >> >> Notwithstanding something strange is happening only on select examples. >> With ksp/ksp/exampeles/tutorials/ex2.c and the run line: >> >> -@${MPIEXEC} -n 2 ./ex2 -ksp_monitor_short -m 20 -n 20 -ksp_type preonly >> -pc_type lu -pc_factor_mat_solver_package superlu_dist >> >> I get good results (of the order): >> >> Norm of error 1.85464e-14 iterations 1 >> >> using both superlu_dist and spooles. >> >> My BLAS/LAPACK: -llapack -lblas (so native to my machine). >> >> If you can guide me on a run line for the snes ex5.c I can try that too. >> I'll also try to construct a GCC build later to see if that is an issue. > > > Same line on ex5, but ex2 is good enough. However, it will not tell us > anything new. Try another build. > > Matt > >> >> -sanjay >> >> >> On 12/23/12 5:58 PM, Matthew Knepley wrote: >> >> On Sun, Dec 23, 2012 at 8:08 PM, Sanjay Govindjee >> wrote: >>> >>> Not sure what you mean by where is your matrix? I am simply running ex6 >>> in the ksp/examples/tests directory. >>> >>> The reason I ran this test is because I was seeing the same behavior with >>> my finite element code (on perfectly benign problems). >>> >>> Is there a built-in test that you use to check that superlu_dist is >>> working properly with petsc? >>> i.e. something you know that works with with petsc 3.3-p5? >> >> >> 1) Run it on a SNES ex5 (or KSP ex2), which is a nice Laplacian >> >> 2) Compare with MUMPS >> >> Matt >> >>> >>> -sanjay >>> >>> >>> >>> On 12/23/12 4:56 PM, Jed Brown wrote: >>> >>> Where is your matrix? It might be ending up with a very bad pivot. If the >>> problem can be reproduced, it should be reported to the SuperLU_DIST >>> developers to fix. (Note that we do not see this with other matrices.) You >>> can also try MUMPS. >>> >>> >>> On Sun, Dec 23, 2012 at 6:48 PM, Sanjay Govindjee >>> wrote: >>>> >>>> I wanted to use SuperLU Dist to perform a direct solve but seem to be >>>> encountering >>>> a problem. I was wonder if this is a know issue and if there is a >>>> solution for it. >>>> >>>> The problem is easily observed using ex6.c in >>>> src/ksp/ksp/examples/tests. >>>> >>>> Out of the box: make runex6 produces a residual error of O(1e-11), all >>>> is well. >>>> >>>> I then changed the run to run on two processors and add the flag >>>> -pc_factor_mat_solver_package spooles this produces a residual error of >>>> O(1e-11), all is still well. >>>> >>>> I then switch over to -pc_factor_mat_solver_package superlu_dist and the >>>> residual error comes back as 22.6637! Something seems very wrong. >>>> >>>> My build is perfectly vanilla: >>>> >>>> export PETSC_DIR=/Users/sg/petsc-3.3-p5/ >>>> export PETSC_ARCH=intel >>>> >>>> ./configure --with-cc=icc --with-fc=ifort \ >>>> -download-{spooles,parmetis,superlu_dist,prometheus,mpich,ml,hypre,metis} >>>> >>>> make PETSC_DIR=/Users/sg/petsc-3.3-p5/ PETSC_ARCH=intel all >>>> make PETSC_DIR=/Users/sg/petsc-3.3-p5/ PETSC_ARCH=intel test >>>> >>>> -sanjay >>> >>> >>> >>> -- >>> ----------------------------------------------- >>> Sanjay Govindjee, PhD, PE >>> Professor of Civil Engineering >>> Vice Chair for Academic Affairs >>> >>> 779 Davis Hall >>> Structural Engineering, Mechanics and Materials >>> Department of Civil Engineering >>> University of California >>> Berkeley, CA 94720-1710 >>> >>> Voice: +1 510 642 6060 >>> FAX: +1 510 643 5264 >>> s_g at berkeley.edu >>> http://www.ce.berkeley.edu/~sanjay >>> ----------------------------------------------- >>> >>> New Books: >>> >>> Engineering Mechanics of Deformable >>> Solids: A Presentation with Exercises >>> >>> http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641 >>> http://ukcatalogue.oup.com/product/9780199651641.do >>> http://amzn.com/0199651647 >>> >>> >>> Engineering Mechanics 3 (Dynamics) >>> http://www.springer.com/materials/mechanics/book/978-3-642-14018-1 >>> http://amzn.com/3642140181 >>> >>> ----------------------------------------------- >> >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> > > > > -- > What most experimenters take for granted before they begin their experiments > is infinitely more interesting than any results to which their experiments > lead. > -- Norbert Wiener > > From s_g at berkeley.edu Wed Dec 26 15:28:37 2012 From: s_g at berkeley.edu (Sanjay Govindjee) Date: Wed, 26 Dec 2012 13:28:37 -0800 Subject: [petsc-users] Using superlu_dist in a direct solve In-Reply-To: References: <50D7A664.6080802@berkeley.edu> <50D7AB15.5040606@berkeley.edu> <50D7BFF3.3030909@berkeley.edu> <50DB6892.5040402@berkeley.edu> Message-ID: <50DB6C05.4090006@berkeley.edu> hmmm....I guess that is good news -- in that superlu is not broken. However, for me not so good news since I seems that there is nasty bug lurking on my machine. Any suggestions on chasing down the error? On 12/26/12 1:23 PM, Hong Zhang wrote: > Sanjay: > I get > petsc-3.3/src/ksp/ksp/examples/tutorials>mpiexec -n 2 ./ex2 > -ksp_monitor_short -ksp_type preonly -pc_type lu > -pc_factor_mat_solver_package superlu_dist -m 500 -n 500 > Norm of error 1.92279e-11 iterations 1 > > Hong > >> I have done some more testing of the problem, continuing with >> src/ksp/ksp/examples/tutorials/ex2.c. >> >> The behavior I am seeing is that with smaller problems sizes superlu_dist is >> behaving properly >> but with larger problem sizes things seem to go wrong and what goes wrong is >> apparently consistent; the error appears both with my intel build as well as >> with my gcc build. >> >> I have two run lines: >> >> runex2superlu: >> -@${MPIEXEC} -n 2 ./ex2 -ksp_monitor_short -m 100 -n 100 -ksp_type >> preonly -pc_type lu -pc_factor_mat_solver_package superlu_dist >> >> runex2spooles: >> -@${MPIEXEC} -n 2 ./ex2 -ksp_monitor_short -m 100 -n 100 -ksp_type >> preonly -pc_type lu -pc_factor_mat_solver_package spooles >> >> From my intel build, I get >> >> sg-macbook-prolocal:tutorials sg$ make runex2superlu >> Norm of error 7.66145e-13 iterations 1 >> sg-macbook-prolocal:tutorials sg$ make runex2spooles >> Norm of error 2.21422e-12 iterations 1 >> >> From my GCC build, I get >> sg-macbook-prolocal:tutorials sg$ make runex2superlu >> Norm of error 7.66145e-13 iterations 1 >> sg-macbook-prolocal:tutorials sg$ make runex2spooles >> Norm of error 2.21422e-12 iterations 1 >> >> If I change the -m 100 -n 100 to -m 500 -n 500, I get for my intel build >> >> sg-macbook-prolocal:tutorials sg$ make runex2superlu >> Norm of error 419.953 iterations 1 >> sg-macbook-prolocal:tutorials sg$ make runex2spooles >> Norm of error 2.69468e-10 iterations 1 >> >> From my GCC build with -m 500 -n 500, I get >> >> sg-macbook-prolocal:tutorials sg$ make runex2superlu >> Norm of error 419.953 iterations 1 >> sg-macbook-prolocal:tutorials sg$ make runex2spooles >> Norm of error 2.69468e-10 iterations 1 >> >> >> Any suggestions will be greatly appreciated. >> >> -sanjay >> >> >> >> >> >> >> >> On 12/23/12 6:42 PM, Matthew Knepley wrote: >> >> >> On Sun, Dec 23, 2012 at 9:37 PM, Sanjay Govindjee wrote: >>> I decided to go with ksp/ksp/exampeles/tutorials/ex2.c; I was unsure how >>> to convert the run lines for snes/examples/ex5.c to work with a direct >>> solver as I am not versed in SNES options. >>> >>> Notwithstanding something strange is happening only on select examples. >>> With ksp/ksp/exampeles/tutorials/ex2.c and the run line: >>> >>> -@${MPIEXEC} -n 2 ./ex2 -ksp_monitor_short -m 20 -n 20 -ksp_type preonly >>> -pc_type lu -pc_factor_mat_solver_package superlu_dist >>> >>> I get good results (of the order): >>> >>> Norm of error 1.85464e-14 iterations 1 >>> >>> using both superlu_dist and spooles. >>> >>> My BLAS/LAPACK: -llapack -lblas (so native to my machine). >>> >>> If you can guide me on a run line for the snes ex5.c I can try that too. >>> I'll also try to construct a GCC build later to see if that is an issue. >> >> Same line on ex5, but ex2 is good enough. However, it will not tell us >> anything new. Try another build. >> >> Matt >> >>> -sanjay >>> >>> >>> On 12/23/12 5:58 PM, Matthew Knepley wrote: >>> >>> On Sun, Dec 23, 2012 at 8:08 PM, Sanjay Govindjee >>> wrote: >>>> Not sure what you mean by where is your matrix? I am simply running ex6 >>>> in the ksp/examples/tests directory. >>>> >>>> The reason I ran this test is because I was seeing the same behavior with >>>> my finite element code (on perfectly benign problems). >>>> >>>> Is there a built-in test that you use to check that superlu_dist is >>>> working properly with petsc? >>>> i.e. something you know that works with with petsc 3.3-p5? >>> >>> 1) Run it on a SNES ex5 (or KSP ex2), which is a nice Laplacian >>> >>> 2) Compare with MUMPS >>> >>> Matt >>> >>>> -sanjay >>>> >>>> >>>> >>>> On 12/23/12 4:56 PM, Jed Brown wrote: >>>> >>>> Where is your matrix? It might be ending up with a very bad pivot. If the >>>> problem can be reproduced, it should be reported to the SuperLU_DIST >>>> developers to fix. (Note that we do not see this with other matrices.) You >>>> can also try MUMPS. >>>> >>>> >>>> On Sun, Dec 23, 2012 at 6:48 PM, Sanjay Govindjee >>>> wrote: >>>>> I wanted to use SuperLU Dist to perform a direct solve but seem to be >>>>> encountering >>>>> a problem. I was wonder if this is a know issue and if there is a >>>>> solution for it. >>>>> >>>>> The problem is easily observed using ex6.c in >>>>> src/ksp/ksp/examples/tests. >>>>> >>>>> Out of the box: make runex6 produces a residual error of O(1e-11), all >>>>> is well. >>>>> >>>>> I then changed the run to run on two processors and add the flag >>>>> -pc_factor_mat_solver_package spooles this produces a residual error of >>>>> O(1e-11), all is still well. >>>>> >>>>> I then switch over to -pc_factor_mat_solver_package superlu_dist and the >>>>> residual error comes back as 22.6637! Something seems very wrong. >>>>> >>>>> My build is perfectly vanilla: >>>>> >>>>> export PETSC_DIR=/Users/sg/petsc-3.3-p5/ >>>>> export PETSC_ARCH=intel >>>>> >>>>> ./configure --with-cc=icc --with-fc=ifort \ >>>>> -download-{spooles,parmetis,superlu_dist,prometheus,mpich,ml,hypre,metis} >>>>> >>>>> make PETSC_DIR=/Users/sg/petsc-3.3-p5/ PETSC_ARCH=intel all >>>>> make PETSC_DIR=/Users/sg/petsc-3.3-p5/ PETSC_ARCH=intel test >>>>> >>>>> -sanjay >>>> >>>> >>>> -- >>>> ----------------------------------------------- >>>> Sanjay Govindjee, PhD, PE >>>> Professor of Civil Engineering >>>> Vice Chair for Academic Affairs >>>> >>>> 779 Davis Hall >>>> Structural Engineering, Mechanics and Materials >>>> Department of Civil Engineering >>>> University of California >>>> Berkeley, CA 94720-1710 >>>> >>>> Voice: +1 510 642 6060 >>>> FAX: +1 510 643 5264 >>>> s_g at berkeley.edu >>>> http://www.ce.berkeley.edu/~sanjay >>>> ----------------------------------------------- >>>> >>>> New Books: >>>> >>>> Engineering Mechanics of Deformable >>>> Solids: A Presentation with Exercises >>>> >>>> http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641 >>>> http://ukcatalogue.oup.com/product/9780199651641.do >>>> http://amzn.com/0199651647 >>>> >>>> >>>> Engineering Mechanics 3 (Dynamics) >>>> http://www.springer.com/materials/mechanics/book/978-3-642-14018-1 >>>> http://amzn.com/3642140181 >>>> >>>> ----------------------------------------------- >>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> >> >> >> -- >> What most experimenters take for granted before they begin their experiments >> is infinitely more interesting than any results to which their experiments >> lead. >> -- Norbert Wiener >> >> -- ----------------------------------------------- Sanjay Govindjee, PhD, PE Professor of Civil Engineering Vice Chair for Academic Affairs 779 Davis Hall Structural Engineering, Mechanics and Materials Department of Civil Engineering University of California Berkeley, CA 94720-1710 Voice: +1 510 642 6060 FAX: +1 510 643 5264 s_g at berkeley.edu http://www.ce.berkeley.edu/~sanjay ----------------------------------------------- New Books: Engineering Mechanics of Deformable Solids: A Presentation with Exercises http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641 http://ukcatalogue.oup.com/product/9780199651641.do http://amzn.com/0199651647 Engineering Mechanics 3 (Dynamics) http://www.springer.com/materials/mechanics/book/978-3-642-14018-1 http://amzn.com/3642140181 ----------------------------------------------- From hzhang at mcs.anl.gov Wed Dec 26 15:34:38 2012 From: hzhang at mcs.anl.gov (Hong Zhang) Date: Wed, 26 Dec 2012 15:34:38 -0600 Subject: [petsc-users] Using superlu_dist in a direct solve In-Reply-To: <50DB6C05.4090006@berkeley.edu> References: <50D7A664.6080802@berkeley.edu> <50D7AB15.5040606@berkeley.edu> <50D7BFF3.3030909@berkeley.edu> <50DB6892.5040402@berkeley.edu> <50DB6C05.4090006@berkeley.edu> Message-ID: Sanjay: > hmmm....I guess that is good news -- in that superlu is not broken. However, > for me > not so good news since I seems that there is nasty bug lurking on my > machine. > > Any suggestions on chasing down the error? How did you install your supelu_dist with petsc-3.3? What machine do you use? Hong > > > On 12/26/12 1:23 PM, Hong Zhang wrote: >> >> Sanjay: >> I get >> petsc-3.3/src/ksp/ksp/examples/tutorials>mpiexec -n 2 ./ex2 >> -ksp_monitor_short -ksp_type preonly -pc_type lu >> -pc_factor_mat_solver_package superlu_dist -m 500 -n 500 >> Norm of error 1.92279e-11 iterations 1 >> >> Hong >> >>> I have done some more testing of the problem, continuing with >>> src/ksp/ksp/examples/tutorials/ex2.c. >>> >>> The behavior I am seeing is that with smaller problems sizes superlu_dist >>> is >>> behaving properly >>> but with larger problem sizes things seem to go wrong and what goes wrong >>> is >>> apparently consistent; the error appears both with my intel build as well >>> as >>> with my gcc build. >>> >>> I have two run lines: >>> >>> runex2superlu: >>> -@${MPIEXEC} -n 2 ./ex2 -ksp_monitor_short -m 100 -n 100 >>> -ksp_type >>> preonly -pc_type lu -pc_factor_mat_solver_package superlu_dist >>> >>> runex2spooles: >>> -@${MPIEXEC} -n 2 ./ex2 -ksp_monitor_short -m 100 -n 100 >>> -ksp_type >>> preonly -pc_type lu -pc_factor_mat_solver_package spooles >>> >>> From my intel build, I get >>> >>> sg-macbook-prolocal:tutorials sg$ make runex2superlu >>> Norm of error 7.66145e-13 iterations 1 >>> sg-macbook-prolocal:tutorials sg$ make runex2spooles >>> Norm of error 2.21422e-12 iterations 1 >>> >>> From my GCC build, I get >>> sg-macbook-prolocal:tutorials sg$ make runex2superlu >>> Norm of error 7.66145e-13 iterations 1 >>> sg-macbook-prolocal:tutorials sg$ make runex2spooles >>> Norm of error 2.21422e-12 iterations 1 >>> >>> If I change the -m 100 -n 100 to -m 500 -n 500, I get for my intel build >>> >>> sg-macbook-prolocal:tutorials sg$ make runex2superlu >>> Norm of error 419.953 iterations 1 >>> sg-macbook-prolocal:tutorials sg$ make runex2spooles >>> Norm of error 2.69468e-10 iterations 1 >>> >>> From my GCC build with -m 500 -n 500, I get >>> >>> sg-macbook-prolocal:tutorials sg$ make runex2superlu >>> Norm of error 419.953 iterations 1 >>> sg-macbook-prolocal:tutorials sg$ make runex2spooles >>> Norm of error 2.69468e-10 iterations 1 >>> >>> >>> Any suggestions will be greatly appreciated. >>> >>> -sanjay >>> >>> >>> >>> >>> >>> >>> >>> On 12/23/12 6:42 PM, Matthew Knepley wrote: >>> >>> >>> On Sun, Dec 23, 2012 at 9:37 PM, Sanjay Govindjee >>> wrote: >>>> >>>> I decided to go with ksp/ksp/exampeles/tutorials/ex2.c; I was unsure how >>>> to convert the run lines for snes/examples/ex5.c to work with a direct >>>> solver as I am not versed in SNES options. >>>> >>>> Notwithstanding something strange is happening only on select examples. >>>> With ksp/ksp/exampeles/tutorials/ex2.c and the run line: >>>> >>>> -@${MPIEXEC} -n 2 ./ex2 -ksp_monitor_short -m 20 -n 20 -ksp_type preonly >>>> -pc_type lu -pc_factor_mat_solver_package superlu_dist >>>> >>>> I get good results (of the order): >>>> >>>> Norm of error 1.85464e-14 iterations 1 >>>> >>>> using both superlu_dist and spooles. >>>> >>>> My BLAS/LAPACK: -llapack -lblas (so native to my machine). >>>> >>>> If you can guide me on a run line for the snes ex5.c I can try that too. >>>> I'll also try to construct a GCC build later to see if that is an issue. >>> >>> >>> Same line on ex5, but ex2 is good enough. However, it will not tell us >>> anything new. Try another build. >>> >>> Matt >>> >>>> -sanjay >>>> >>>> >>>> On 12/23/12 5:58 PM, Matthew Knepley wrote: >>>> >>>> On Sun, Dec 23, 2012 at 8:08 PM, Sanjay Govindjee >>>> wrote: >>>>> >>>>> Not sure what you mean by where is your matrix? I am simply running >>>>> ex6 >>>>> in the ksp/examples/tests directory. >>>>> >>>>> The reason I ran this test is because I was seeing the same behavior >>>>> with >>>>> my finite element code (on perfectly benign problems). >>>>> >>>>> Is there a built-in test that you use to check that superlu_dist is >>>>> working properly with petsc? >>>>> i.e. something you know that works with with petsc 3.3-p5? >>>> >>>> >>>> 1) Run it on a SNES ex5 (or KSP ex2), which is a nice Laplacian >>>> >>>> 2) Compare with MUMPS >>>> >>>> Matt >>>> >>>>> -sanjay >>>>> >>>>> >>>>> >>>>> On 12/23/12 4:56 PM, Jed Brown wrote: >>>>> >>>>> Where is your matrix? It might be ending up with a very bad pivot. If >>>>> the >>>>> problem can be reproduced, it should be reported to the SuperLU_DIST >>>>> developers to fix. (Note that we do not see this with other matrices.) >>>>> You >>>>> can also try MUMPS. >>>>> >>>>> >>>>> On Sun, Dec 23, 2012 at 6:48 PM, Sanjay Govindjee >>>>> wrote: >>>>>> >>>>>> I wanted to use SuperLU Dist to perform a direct solve but seem to be >>>>>> encountering >>>>>> a problem. I was wonder if this is a know issue and if there is a >>>>>> solution for it. >>>>>> >>>>>> The problem is easily observed using ex6.c in >>>>>> src/ksp/ksp/examples/tests. >>>>>> >>>>>> Out of the box: make runex6 produces a residual error of O(1e-11), all >>>>>> is well. >>>>>> >>>>>> I then changed the run to run on two processors and add the flag >>>>>> -pc_factor_mat_solver_package spooles this produces a residual error >>>>>> of >>>>>> O(1e-11), all is still well. >>>>>> >>>>>> I then switch over to -pc_factor_mat_solver_package superlu_dist and >>>>>> the >>>>>> residual error comes back as 22.6637! Something seems very wrong. >>>>>> >>>>>> My build is perfectly vanilla: >>>>>> >>>>>> export PETSC_DIR=/Users/sg/petsc-3.3-p5/ >>>>>> export PETSC_ARCH=intel >>>>>> >>>>>> ./configure --with-cc=icc --with-fc=ifort \ >>>>>> >>>>>> -download-{spooles,parmetis,superlu_dist,prometheus,mpich,ml,hypre,metis} >>>>>> >>>>>> make PETSC_DIR=/Users/sg/petsc-3.3-p5/ PETSC_ARCH=intel all >>>>>> make PETSC_DIR=/Users/sg/petsc-3.3-p5/ PETSC_ARCH=intel test >>>>>> >>>>>> -sanjay >>>>> >>>>> >>>>> >>>>> -- >>>>> ----------------------------------------------- >>>>> Sanjay Govindjee, PhD, PE >>>>> Professor of Civil Engineering >>>>> Vice Chair for Academic Affairs >>>>> >>>>> 779 Davis Hall >>>>> Structural Engineering, Mechanics and Materials >>>>> Department of Civil Engineering >>>>> University of California >>>>> Berkeley, CA 94720-1710 >>>>> >>>>> Voice: +1 510 642 6060 >>>>> FAX: +1 510 643 5264 >>>>> s_g at berkeley.edu >>>>> http://www.ce.berkeley.edu/~sanjay >>>>> ----------------------------------------------- >>>>> >>>>> New Books: >>>>> >>>>> Engineering Mechanics of Deformable >>>>> Solids: A Presentation with Exercises >>>>> >>>>> >>>>> http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641 >>>>> http://ukcatalogue.oup.com/product/9780199651641.do >>>>> http://amzn.com/0199651647 >>>>> >>>>> >>>>> Engineering Mechanics 3 (Dynamics) >>>>> http://www.springer.com/materials/mechanics/book/978-3-642-14018-1 >>>>> http://amzn.com/3642140181 >>>>> >>>>> ----------------------------------------------- >>>> >>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which >>>> their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments >>> is infinitely more interesting than any results to which their >>> experiments >>> lead. >>> -- Norbert Wiener >>> >>> > > -- > ----------------------------------------------- > Sanjay Govindjee, PhD, PE > Professor of Civil Engineering > Vice Chair for Academic Affairs > > 779 Davis Hall > Structural Engineering, Mechanics and Materials > Department of Civil Engineering > University of California > Berkeley, CA 94720-1710 > > Voice: +1 510 642 6060 > FAX: +1 510 643 5264 > s_g at berkeley.edu > http://www.ce.berkeley.edu/~sanjay > ----------------------------------------------- > > New Books: > > Engineering Mechanics of Deformable > Solids: A Presentation with Exercises > http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641 > http://ukcatalogue.oup.com/product/9780199651641.do > http://amzn.com/0199651647 > > > Engineering Mechanics 3 (Dynamics) > http://www.springer.com/materials/mechanics/book/978-3-642-14018-1 > http://amzn.com/3642140181 > > ----------------------------------------------- > From s_g at berkeley.edu Wed Dec 26 15:38:24 2012 From: s_g at berkeley.edu (Sanjay Govindjee) Date: Wed, 26 Dec 2012 13:38:24 -0800 Subject: [petsc-users] Using superlu_dist in a direct solve In-Reply-To: References: <50D7A664.6080802@berkeley.edu> <50D7AB15.5040606@berkeley.edu> <50D7BFF3.3030909@berkeley.edu> <50DB6892.5040402@berkeley.edu> <50DB6C05.4090006@berkeley.edu> Message-ID: <50DB6E50.3050001@berkeley.edu> I have a macbook pro (Mac OS X 10.7.5) % uname -a Darwin sg-macbook-prolocal.local 11.4.2 Darwin Kernel Version 11.4.2: Thu Aug 23 16:25:48 PDT 2012; root:xnu-1699.32.7~1/RELEASE_X86_64 x86_64 I configured using: ./configure --with-cc=icc --with-fc=ifort -download-{spooles,parmetis,superlu_dist,prometheus,mpich,ml,hypre,metis} so everything was built together. On 12/26/12 1:34 PM, Hong Zhang wrote: > Sanjay: >> hmmm....I guess that is good news -- in that superlu is not broken. However, >> for me >> not so good news since I seems that there is nasty bug lurking on my >> machine. >> >> Any suggestions on chasing down the error? > How did you install your supelu_dist with petsc-3.3? > What machine do you use? > > Hong >> >> On 12/26/12 1:23 PM, Hong Zhang wrote: >>> Sanjay: >>> I get >>> petsc-3.3/src/ksp/ksp/examples/tutorials>mpiexec -n 2 ./ex2 >>> -ksp_monitor_short -ksp_type preonly -pc_type lu >>> -pc_factor_mat_solver_package superlu_dist -m 500 -n 500 >>> Norm of error 1.92279e-11 iterations 1 >>> >>> Hong >>> >>>> I have done some more testing of the problem, continuing with >>>> src/ksp/ksp/examples/tutorials/ex2.c. >>>> >>>> The behavior I am seeing is that with smaller problems sizes superlu_dist >>>> is >>>> behaving properly >>>> but with larger problem sizes things seem to go wrong and what goes wrong >>>> is >>>> apparently consistent; the error appears both with my intel build as well >>>> as >>>> with my gcc build. >>>> >>>> I have two run lines: >>>> >>>> runex2superlu: >>>> -@${MPIEXEC} -n 2 ./ex2 -ksp_monitor_short -m 100 -n 100 >>>> -ksp_type >>>> preonly -pc_type lu -pc_factor_mat_solver_package superlu_dist >>>> >>>> runex2spooles: >>>> -@${MPIEXEC} -n 2 ./ex2 -ksp_monitor_short -m 100 -n 100 >>>> -ksp_type >>>> preonly -pc_type lu -pc_factor_mat_solver_package spooles >>>> >>>> From my intel build, I get >>>> >>>> sg-macbook-prolocal:tutorials sg$ make runex2superlu >>>> Norm of error 7.66145e-13 iterations 1 >>>> sg-macbook-prolocal:tutorials sg$ make runex2spooles >>>> Norm of error 2.21422e-12 iterations 1 >>>> >>>> From my GCC build, I get >>>> sg-macbook-prolocal:tutorials sg$ make runex2superlu >>>> Norm of error 7.66145e-13 iterations 1 >>>> sg-macbook-prolocal:tutorials sg$ make runex2spooles >>>> Norm of error 2.21422e-12 iterations 1 >>>> >>>> If I change the -m 100 -n 100 to -m 500 -n 500, I get for my intel build >>>> >>>> sg-macbook-prolocal:tutorials sg$ make runex2superlu >>>> Norm of error 419.953 iterations 1 >>>> sg-macbook-prolocal:tutorials sg$ make runex2spooles >>>> Norm of error 2.69468e-10 iterations 1 >>>> >>>> From my GCC build with -m 500 -n 500, I get >>>> >>>> sg-macbook-prolocal:tutorials sg$ make runex2superlu >>>> Norm of error 419.953 iterations 1 >>>> sg-macbook-prolocal:tutorials sg$ make runex2spooles >>>> Norm of error 2.69468e-10 iterations 1 >>>> >>>> >>>> Any suggestions will be greatly appreciated. >>>> >>>> -sanjay >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> On 12/23/12 6:42 PM, Matthew Knepley wrote: >>>> >>>> >>>> On Sun, Dec 23, 2012 at 9:37 PM, Sanjay Govindjee >>>> wrote: >>>>> I decided to go with ksp/ksp/exampeles/tutorials/ex2.c; I was unsure how >>>>> to convert the run lines for snes/examples/ex5.c to work with a direct >>>>> solver as I am not versed in SNES options. >>>>> >>>>> Notwithstanding something strange is happening only on select examples. >>>>> With ksp/ksp/exampeles/tutorials/ex2.c and the run line: >>>>> >>>>> -@${MPIEXEC} -n 2 ./ex2 -ksp_monitor_short -m 20 -n 20 -ksp_type preonly >>>>> -pc_type lu -pc_factor_mat_solver_package superlu_dist >>>>> >>>>> I get good results (of the order): >>>>> >>>>> Norm of error 1.85464e-14 iterations 1 >>>>> >>>>> using both superlu_dist and spooles. >>>>> >>>>> My BLAS/LAPACK: -llapack -lblas (so native to my machine). >>>>> >>>>> If you can guide me on a run line for the snes ex5.c I can try that too. >>>>> I'll also try to construct a GCC build later to see if that is an issue. >>>> >>>> Same line on ex5, but ex2 is good enough. However, it will not tell us >>>> anything new. Try another build. >>>> >>>> Matt >>>> >>>>> -sanjay >>>>> >>>>> >>>>> On 12/23/12 5:58 PM, Matthew Knepley wrote: >>>>> >>>>> On Sun, Dec 23, 2012 at 8:08 PM, Sanjay Govindjee >>>>> wrote: >>>>>> Not sure what you mean by where is your matrix? I am simply running >>>>>> ex6 >>>>>> in the ksp/examples/tests directory. >>>>>> >>>>>> The reason I ran this test is because I was seeing the same behavior >>>>>> with >>>>>> my finite element code (on perfectly benign problems). >>>>>> >>>>>> Is there a built-in test that you use to check that superlu_dist is >>>>>> working properly with petsc? >>>>>> i.e. something you know that works with with petsc 3.3-p5? >>>>> >>>>> 1) Run it on a SNES ex5 (or KSP ex2), which is a nice Laplacian >>>>> >>>>> 2) Compare with MUMPS >>>>> >>>>> Matt >>>>> >>>>>> -sanjay >>>>>> >>>>>> >>>>>> >>>>>> On 12/23/12 4:56 PM, Jed Brown wrote: >>>>>> >>>>>> Where is your matrix? It might be ending up with a very bad pivot. If >>>>>> the >>>>>> problem can be reproduced, it should be reported to the SuperLU_DIST >>>>>> developers to fix. (Note that we do not see this with other matrices.) >>>>>> You >>>>>> can also try MUMPS. >>>>>> >>>>>> >>>>>> On Sun, Dec 23, 2012 at 6:48 PM, Sanjay Govindjee >>>>>> wrote: >>>>>>> I wanted to use SuperLU Dist to perform a direct solve but seem to be >>>>>>> encountering >>>>>>> a problem. I was wonder if this is a know issue and if there is a >>>>>>> solution for it. >>>>>>> >>>>>>> The problem is easily observed using ex6.c in >>>>>>> src/ksp/ksp/examples/tests. >>>>>>> >>>>>>> Out of the box: make runex6 produces a residual error of O(1e-11), all >>>>>>> is well. >>>>>>> >>>>>>> I then changed the run to run on two processors and add the flag >>>>>>> -pc_factor_mat_solver_package spooles this produces a residual error >>>>>>> of >>>>>>> O(1e-11), all is still well. >>>>>>> >>>>>>> I then switch over to -pc_factor_mat_solver_package superlu_dist and >>>>>>> the >>>>>>> residual error comes back as 22.6637! Something seems very wrong. >>>>>>> >>>>>>> My build is perfectly vanilla: >>>>>>> >>>>>>> export PETSC_DIR=/Users/sg/petsc-3.3-p5/ >>>>>>> export PETSC_ARCH=intel >>>>>>> >>>>>>> ./configure --with-cc=icc --with-fc=ifort \ >>>>>>> >>>>>>> -download-{spooles,parmetis,superlu_dist,prometheus,mpich,ml,hypre,metis} >>>>>>> >>>>>>> make PETSC_DIR=/Users/sg/petsc-3.3-p5/ PETSC_ARCH=intel all >>>>>>> make PETSC_DIR=/Users/sg/petsc-3.3-p5/ PETSC_ARCH=intel test >>>>>>> >>>>>>> -sanjay >>>>>> >>>>>> >>>>>> -- >>>>>> ----------------------------------------------- >>>>>> Sanjay Govindjee, PhD, PE >>>>>> Professor of Civil Engineering >>>>>> Vice Chair for Academic Affairs >>>>>> >>>>>> 779 Davis Hall >>>>>> Structural Engineering, Mechanics and Materials >>>>>> Department of Civil Engineering >>>>>> University of California >>>>>> Berkeley, CA 94720-1710 >>>>>> >>>>>> Voice: +1 510 642 6060 >>>>>> FAX: +1 510 643 5264 >>>>>> s_g at berkeley.edu >>>>>> http://www.ce.berkeley.edu/~sanjay >>>>>> ----------------------------------------------- >>>>>> >>>>>> New Books: >>>>>> >>>>>> Engineering Mechanics of Deformable >>>>>> Solids: A Presentation with Exercises >>>>>> >>>>>> >>>>>> http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641 >>>>>> http://ukcatalogue.oup.com/product/9780199651641.do >>>>>> http://amzn.com/0199651647 >>>>>> >>>>>> >>>>>> Engineering Mechanics 3 (Dynamics) >>>>>> http://www.springer.com/materials/mechanics/book/978-3-642-14018-1 >>>>>> http://amzn.com/3642140181 >>>>>> >>>>>> ----------------------------------------------- >>>>> >>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their >>>>> experiments is infinitely more interesting than any results to which >>>>> their >>>>> experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments >>>> is infinitely more interesting than any results to which their >>>> experiments >>>> lead. >>>> -- Norbert Wiener >>>> >>>> >> -- >> ----------------------------------------------- >> Sanjay Govindjee, PhD, PE >> Professor of Civil Engineering >> Vice Chair for Academic Affairs >> >> 779 Davis Hall >> Structural Engineering, Mechanics and Materials >> Department of Civil Engineering >> University of California >> Berkeley, CA 94720-1710 >> >> Voice: +1 510 642 6060 >> FAX: +1 510 643 5264 >> s_g at berkeley.edu >> http://www.ce.berkeley.edu/~sanjay >> ----------------------------------------------- >> >> New Books: >> >> Engineering Mechanics of Deformable >> Solids: A Presentation with Exercises >> http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641 >> http://ukcatalogue.oup.com/product/9780199651641.do >> http://amzn.com/0199651647 >> >> >> Engineering Mechanics 3 (Dynamics) >> http://www.springer.com/materials/mechanics/book/978-3-642-14018-1 >> http://amzn.com/3642140181 >> >> ----------------------------------------------- >> -- ----------------------------------------------- Sanjay Govindjee, PhD, PE Professor of Civil Engineering Vice Chair for Academic Affairs 779 Davis Hall Structural Engineering, Mechanics and Materials Department of Civil Engineering University of California Berkeley, CA 94720-1710 Voice: +1 510 642 6060 FAX: +1 510 643 5264 s_g at berkeley.edu http://www.ce.berkeley.edu/~sanjay ----------------------------------------------- New Books: Engineering Mechanics of Deformable Solids: A Presentation with Exercises http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641 http://ukcatalogue.oup.com/product/9780199651641.do http://amzn.com/0199651647 Engineering Mechanics 3 (Dynamics) http://www.springer.com/materials/mechanics/book/978-3-642-14018-1 http://amzn.com/3642140181 ----------------------------------------------- From knepley at gmail.com Wed Dec 26 17:08:07 2012 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 26 Dec 2012 18:08:07 -0500 Subject: [petsc-users] Using superlu_dist in a direct solve In-Reply-To: <50DB6E50.3050001@berkeley.edu> References: <50D7A664.6080802@berkeley.edu> <50D7AB15.5040606@berkeley.edu> <50D7BFF3.3030909@berkeley.edu> <50DB6892.5040402@berkeley.edu> <50DB6C05.4090006@berkeley.edu> <50DB6E50.3050001@berkeley.edu> Message-ID: On Wed, Dec 26, 2012 at 4:38 PM, Sanjay Govindjee wrote: > I have a macbook pro (Mac OS X 10.7.5) > > % uname -a > Darwin sg-macbook-prolocal.local 11.4.2 Darwin Kernel Version 11.4.2: Thu > Aug 23 16:25:48 PDT 2012; root:xnu-1699.32.7~1/RELEASE_**X86_64 x86_64 > > I configured using: > > > ./configure --with-cc=icc --with-fc=ifort -download-{spooles,parmetis,** > superlu_dist,prometheus,mpich,**ml,hypre,metis} > > so everything was built together. Since a) you have tried other compilers b) we cannot reproduce it c) we are building the library during configure I would guess that some outside library, in your default link path, is contaminating the executable with symbols which override some of those in SuperLU. The SuperLU people are not super careful about naming. Could you 1) Try this same exercise using --with-shared-libraries 2) Once you do that, use otool -L on the executable so we can see where everything comes from Thanks, Matt > On 12/26/12 1:34 PM, Hong Zhang wrote: > >> Sanjay: >> >>> hmmm....I guess that is good news -- in that superlu is not broken. >>> However, >>> for me >>> not so good news since I seems that there is nasty bug lurking on my >>> machine. >>> >>> Any suggestions on chasing down the error? >>> >> How did you install your supelu_dist with petsc-3.3? >> What machine do you use? >> >> Hong >> >>> >>> On 12/26/12 1:23 PM, Hong Zhang wrote: >>> >>>> Sanjay: >>>> I get >>>> petsc-3.3/src/ksp/ksp/**examples/tutorials>mpiexec -n 2 ./ex2 >>>> -ksp_monitor_short -ksp_type preonly -pc_type lu >>>> -pc_factor_mat_solver_package superlu_dist -m 500 -n 500 >>>> Norm of error 1.92279e-11 iterations 1 >>>> >>>> Hong >>>> >>>> I have done some more testing of the problem, continuing with >>>>> src/ksp/ksp/examples/**tutorials/ex2.c. >>>>> >>>>> The behavior I am seeing is that with smaller problems sizes >>>>> superlu_dist >>>>> is >>>>> behaving properly >>>>> but with larger problem sizes things seem to go wrong and what goes >>>>> wrong >>>>> is >>>>> apparently consistent; the error appears both with my intel build as >>>>> well >>>>> as >>>>> with my gcc build. >>>>> >>>>> I have two run lines: >>>>> >>>>> runex2superlu: >>>>> -@${MPIEXEC} -n 2 ./ex2 -ksp_monitor_short -m 100 -n 100 >>>>> -ksp_type >>>>> preonly -pc_type lu -pc_factor_mat_solver_package superlu_dist >>>>> >>>>> runex2spooles: >>>>> -@${MPIEXEC} -n 2 ./ex2 -ksp_monitor_short -m 100 -n 100 >>>>> -ksp_type >>>>> preonly -pc_type lu -pc_factor_mat_solver_package spooles >>>>> >>>>> From my intel build, I get >>>>> >>>>> sg-macbook-prolocal:tutorials sg$ make runex2superlu >>>>> Norm of error 7.66145e-13 iterations 1 >>>>> sg-macbook-prolocal:tutorials sg$ make runex2spooles >>>>> Norm of error 2.21422e-12 iterations 1 >>>>> >>>>> From my GCC build, I get >>>>> sg-macbook-prolocal:tutorials sg$ make runex2superlu >>>>> Norm of error 7.66145e-13 iterations 1 >>>>> sg-macbook-prolocal:tutorials sg$ make runex2spooles >>>>> Norm of error 2.21422e-12 iterations 1 >>>>> >>>>> If I change the -m 100 -n 100 to -m 500 -n 500, I get for my intel >>>>> build >>>>> >>>>> sg-macbook-prolocal:tutorials sg$ make runex2superlu >>>>> Norm of error 419.953 iterations 1 >>>>> sg-macbook-prolocal:tutorials sg$ make runex2spooles >>>>> Norm of error 2.69468e-10 iterations 1 >>>>> >>>>> From my GCC build with -m 500 -n 500, I get >>>>> >>>>> sg-macbook-prolocal:tutorials sg$ make runex2superlu >>>>> Norm of error 419.953 iterations 1 >>>>> sg-macbook-prolocal:tutorials sg$ make runex2spooles >>>>> Norm of error 2.69468e-10 iterations 1 >>>>> >>>>> >>>>> Any suggestions will be greatly appreciated. >>>>> >>>>> -sanjay >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On 12/23/12 6:42 PM, Matthew Knepley wrote: >>>>> >>>>> >>>>> On Sun, Dec 23, 2012 at 9:37 PM, Sanjay Govindjee >>>>> wrote: >>>>> >>>>>> I decided to go with ksp/ksp/exampeles/tutorials/**ex2.c; I was >>>>>> unsure how >>>>>> to convert the run lines for snes/examples/ex5.c to work with a direct >>>>>> solver as I am not versed in SNES options. >>>>>> >>>>>> Notwithstanding something strange is happening only on select >>>>>> examples. >>>>>> With ksp/ksp/exampeles/tutorials/**ex2.c and the run line: >>>>>> >>>>>> -@${MPIEXEC} -n 2 ./ex2 -ksp_monitor_short -m 20 -n 20 -ksp_type >>>>>> preonly >>>>>> -pc_type lu -pc_factor_mat_solver_package superlu_dist >>>>>> >>>>>> I get good results (of the order): >>>>>> >>>>>> Norm of error 1.85464e-14 iterations 1 >>>>>> >>>>>> using both superlu_dist and spooles. >>>>>> >>>>>> My BLAS/LAPACK: -llapack -lblas (so native to my machine). >>>>>> >>>>>> If you can guide me on a run line for the snes ex5.c I can try that >>>>>> too. >>>>>> I'll also try to construct a GCC build later to see if that is an >>>>>> issue. >>>>>> >>>>> >>>>> Same line on ex5, but ex2 is good enough. However, it will not tell us >>>>> anything new. Try another build. >>>>> >>>>> Matt >>>>> >>>>> -sanjay >>>>>> >>>>>> >>>>>> On 12/23/12 5:58 PM, Matthew Knepley wrote: >>>>>> >>>>>> On Sun, Dec 23, 2012 at 8:08 PM, Sanjay Govindjee >>>>>> wrote: >>>>>> >>>>>>> Not sure what you mean by where is your matrix? I am simply running >>>>>>> ex6 >>>>>>> in the ksp/examples/tests directory. >>>>>>> >>>>>>> The reason I ran this test is because I was seeing the same behavior >>>>>>> with >>>>>>> my finite element code (on perfectly benign problems). >>>>>>> >>>>>>> Is there a built-in test that you use to check that superlu_dist is >>>>>>> working properly with petsc? >>>>>>> i.e. something you know that works with with petsc 3.3-p5? >>>>>>> >>>>>> >>>>>> 1) Run it on a SNES ex5 (or KSP ex2), which is a nice Laplacian >>>>>> >>>>>> 2) Compare with MUMPS >>>>>> >>>>>> Matt >>>>>> >>>>>> -sanjay >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 12/23/12 4:56 PM, Jed Brown wrote: >>>>>>> >>>>>>> Where is your matrix? It might be ending up with a very bad pivot. If >>>>>>> the >>>>>>> problem can be reproduced, it should be reported to the SuperLU_DIST >>>>>>> developers to fix. (Note that we do not see this with other >>>>>>> matrices.) >>>>>>> You >>>>>>> can also try MUMPS. >>>>>>> >>>>>>> >>>>>>> On Sun, Dec 23, 2012 at 6:48 PM, Sanjay Govindjee >>>>>>> wrote: >>>>>>> >>>>>>>> I wanted to use SuperLU Dist to perform a direct solve but seem to >>>>>>>> be >>>>>>>> encountering >>>>>>>> a problem. I was wonder if this is a know issue and if there is a >>>>>>>> solution for it. >>>>>>>> >>>>>>>> The problem is easily observed using ex6.c in >>>>>>>> src/ksp/ksp/examples/tests. >>>>>>>> >>>>>>>> Out of the box: make runex6 produces a residual error of O(1e-11), >>>>>>>> all >>>>>>>> is well. >>>>>>>> >>>>>>>> I then changed the run to run on two processors and add the flag >>>>>>>> -pc_factor_mat_solver_package spooles this produces a residual >>>>>>>> error >>>>>>>> of >>>>>>>> O(1e-11), all is still well. >>>>>>>> >>>>>>>> I then switch over to -pc_factor_mat_solver_package superlu_dist and >>>>>>>> the >>>>>>>> residual error comes back as 22.6637! Something seems very wrong. >>>>>>>> >>>>>>>> My build is perfectly vanilla: >>>>>>>> >>>>>>>> export PETSC_DIR=/Users/sg/petsc-3.3-**p5/ >>>>>>>> export PETSC_ARCH=intel >>>>>>>> >>>>>>>> ./configure --with-cc=icc --with-fc=ifort \ >>>>>>>> >>>>>>>> -download-{spooles,parmetis,**superlu_dist,prometheus,mpich,** >>>>>>>> ml,hypre,metis} >>>>>>>> >>>>>>>> make PETSC_DIR=/Users/sg/petsc-3.3-**p5/ PETSC_ARCH=intel all >>>>>>>> make PETSC_DIR=/Users/sg/petsc-3.3-**p5/ PETSC_ARCH=intel test >>>>>>>> >>>>>>>> -sanjay >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> ------------------------------**----------------- >>>>>>> Sanjay Govindjee, PhD, PE >>>>>>> Professor of Civil Engineering >>>>>>> Vice Chair for Academic Affairs >>>>>>> >>>>>>> 779 Davis Hall >>>>>>> Structural Engineering, Mechanics and Materials >>>>>>> Department of Civil Engineering >>>>>>> University of California >>>>>>> Berkeley, CA 94720-1710 >>>>>>> >>>>>>> Voice: +1 510 642 6060 >>>>>>> FAX: +1 510 643 5264 >>>>>>> s_g at berkeley.edu >>>>>>> http://www.ce.berkeley.edu/~**sanjay >>>>>>> ------------------------------**----------------- >>>>>>> >>>>>>> New Books: >>>>>>> >>>>>>> Engineering Mechanics of Deformable >>>>>>> Solids: A Presentation with Exercises >>>>>>> >>>>>>> >>>>>>> http://www.oup.com/us/catalog/**general/subject/Physics/** >>>>>>> MaterialsScience/?view=usa&ci=**9780199651641 >>>>>>> http://ukcatalogue.oup.com/**product/9780199651641.do >>>>>>> http://amzn.com/0199651647 >>>>>>> >>>>>>> >>>>>>> Engineering Mechanics 3 (Dynamics) >>>>>>> http://www.springer.com/**materials/mechanics/book/978-** >>>>>>> 3-642-14018-1 >>>>>>> http://amzn.com/3642140181 >>>>>>> >>>>>>> ------------------------------**----------------- >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> What most experimenters take for granted before they begin their >>>>>> experiments is infinitely more interesting than any results to which >>>>>> their >>>>>> experiments lead. >>>>>> -- Norbert Wiener >>>>>> >>>>>> >>>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their >>>>> experiments >>>>> is infinitely more interesting than any results to which their >>>>> experiments >>>>> lead. >>>>> -- Norbert Wiener >>>>> >>>>> >>>>> -- >>> ------------------------------**----------------- >>> Sanjay Govindjee, PhD, PE >>> Professor of Civil Engineering >>> Vice Chair for Academic Affairs >>> >>> 779 Davis Hall >>> Structural Engineering, Mechanics and Materials >>> Department of Civil Engineering >>> University of California >>> Berkeley, CA 94720-1710 >>> >>> Voice: +1 510 642 6060 >>> FAX: +1 510 643 5264 >>> s_g at berkeley.edu >>> http://www.ce.berkeley.edu/~**sanjay >>> ------------------------------**----------------- >>> >>> New Books: >>> >>> Engineering Mechanics of Deformable >>> Solids: A Presentation with Exercises >>> http://www.oup.com/us/catalog/**general/subject/Physics/** >>> MaterialsScience/?view=usa&ci=**9780199651641 >>> http://ukcatalogue.oup.com/**product/9780199651641.do >>> http://amzn.com/0199651647 >>> >>> >>> Engineering Mechanics 3 (Dynamics) >>> http://www.springer.com/**materials/mechanics/book/978-**3-642-14018-1 >>> http://amzn.com/3642140181 >>> >>> ------------------------------**----------------- >>> >>> > -- > ------------------------------**----------------- > Sanjay Govindjee, PhD, PE > Professor of Civil Engineering > Vice Chair for Academic Affairs > > 779 Davis Hall > Structural Engineering, Mechanics and Materials > Department of Civil Engineering > University of California > Berkeley, CA 94720-1710 > > Voice: +1 510 642 6060 > FAX: +1 510 643 5264 > s_g at berkeley.edu > http://www.ce.berkeley.edu/~**sanjay > ------------------------------**----------------- > > New Books: > > Engineering Mechanics of Deformable > Solids: A Presentation with Exercises > http://www.oup.com/us/catalog/**general/subject/Physics/** > MaterialsScience/?view=usa&ci=**9780199651641 > http://ukcatalogue.oup.com/**product/9780199651641.do > http://amzn.com/0199651647 > > > Engineering Mechanics 3 (Dynamics) > http://www.springer.com/**materials/mechanics/book/978-**3-642-14018-1 > http://amzn.com/3642140181 > > ------------------------------**----------------- > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From s_g at berkeley.edu Wed Dec 26 19:24:25 2012 From: s_g at berkeley.edu (Sanjay Govindjee) Date: Wed, 26 Dec 2012 17:24:25 -0800 Subject: [petsc-users] Using superlu_dist in a direct solve In-Reply-To: References: <50D7A664.6080802@berkeley.edu> <50D7AB15.5040606@berkeley.edu> <50D7BFF3.3030909@berkeley.edu> <50DB6892.5040402@berkeley.edu> <50DB6C05.4090006@berkeley.edu> <50DB6E50.3050001@berkeley.edu> Message-ID: <50DBA349.7030307@berkeley.edu> I have re-configured/built using: ./configure PETSC_ARCH=gnu_shared -download-{spooles,parmetis,superlu_dist,prometheus,mpich,ml,hypre,metis} --with-shared-libraries make PETSC_ARCH=gnu_shared all make PETSC_ARCH=gnu_shared test Using the same test problem (src/ksp/ksp/examples/tutorials/ex2.c), on the 100x100 case I get: sg-macbook-prolocal:tutorials sg$ make PETSC_ARCH=gnu_shared runex2spooles Norm of error 2.21422e-12 iterations 1 sg-macbook-prolocal:tutorials sg$ make PETSC_ARCH=gnu_shared runex2superlu Norm of error 7.66145e-13 iterations 1 One the 500x500 case I get: sg-macbook-prolocal:tutorials sg$ make PETSC_ARCH=gnu_shared runex2spooles Norm of error 2.69468e-10 iterations 1 sg-macbook-prolocal:tutorials sg$ make PETSC_ARCH=gnu_shared runex2superlu Norm of error 419.953 iterations 1 otool shows: sg-macbook-prolocal:tutorials sg$ otool -L ex2 ex2: /Users/sg/petsc-3.3-p5/gnu_shared/lib/libpetsc.dylib (compatibility version 0.0.0, current version 0.0.0) /usr/X11/lib/libX11.6.dylib (compatibility version 10.0.0, current version 10.0.0) /Users/sg/petsc-3.3-p5/gnu_shared/lib/libmpichcxx.dylib (compatibility version 0.0.0, current version 3.0.0) /usr/local/lib/libstdc++.6.dylib (compatibility version 7.0.0, current version 7.17.0) /Users/sg/petsc-3.3-p5/gnu_shared/lib/libparmetis.dylib (compatibility version 0.0.0, current version 0.0.0) /Users/sg/petsc-3.3-p5/gnu_shared/lib/libmetis.dylib (compatibility version 0.0.0, current version 0.0.0) /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libLAPACK.dylib (compatibility version 1.0.0, current version 1.0.0) /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib (compatibility version 1.0.0, current version 1.0.0) /Users/sg/petsc-3.3-p5/gnu_shared/lib/libmpichf90.dylib (compatibility version 0.0.0, current version 3.0.0) /usr/local/lib/libgfortran.3.dylib (compatibility version 4.0.0, current version 4.0.0) /usr/local/lib/libquadmath.0.dylib (compatibility version 1.0.0, current version 1.0.0) /Users/sg/petsc-3.3-p5/gnu_shared/lib/libpmpich.dylib (compatibility version 0.0.0, current version 3.0.0) /Users/sg/petsc-3.3-p5/gnu_shared/lib/libmpich.dylib (compatibility version 0.0.0, current version 3.0.0) /Users/sg/petsc-3.3-p5/gnu_shared/lib/libopa.1.dylib (compatibility version 2.0.0, current version 2.0.0) /Users/sg/petsc-3.3-p5/gnu_shared/lib/libmpl.1.dylib (compatibility version 3.0.0, current version 3.0.0) /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 159.1.0) /usr/local/lib/libgcc_s.1.dylib (compatibility version 1.0.0, current version 1.0.0) On 12/26/12 3:08 PM, Matthew Knepley wrote: > > On Wed, Dec 26, 2012 at 4:38 PM, Sanjay Govindjee > wrote: > > I have a macbook pro (Mac OS X 10.7.5) > > % uname -a > Darwin sg-macbook-prolocal.local 11.4.2 Darwin Kernel Version > 11.4.2: Thu Aug 23 16:25:48 PDT 2012; > root:xnu-1699.32.7~1/RELEASE_X86_64 x86_64 > > I configured using: > > > ./configure --with-cc=icc --with-fc=ifort > -download-{spooles,parmetis,superlu_dist,prometheus,mpich,ml,hypre,metis} > > so everything was built together. > > > Since > > a) you have tried other compilers > > b) we cannot reproduce it > > c) we are building the library during configure > > I would guess that some outside library, in your default link path, is > contaminating > the executable with symbols which override some of those in SuperLU. > The SuperLU > people are not super careful about naming. Could you > > 1) Try this same exercise using --with-shared-libraries > > 2) Once you do that, use otool -L on the executable so we can see > where everything comes from > > Thanks, > > Matt > > On 12/26/12 1:34 PM, Hong Zhang wrote: > > Sanjay: > > hmmm....I guess that is good news -- in that superlu is > not broken. However, > for me > not so good news since I seems that there is nasty bug > lurking on my > machine. > > Any suggestions on chasing down the error? > > How did you install your supelu_dist with petsc-3.3? > What machine do you use? > > Hong > > > On 12/26/12 1:23 PM, Hong Zhang wrote: > > Sanjay: > I get > petsc-3.3/src/ksp/ksp/examples/tutorials>mpiexec -n 2 > ./ex2 > -ksp_monitor_short -ksp_type preonly -pc_type lu > -pc_factor_mat_solver_package superlu_dist -m 500 -n 500 > Norm of error 1.92279e-11 iterations 1 > > Hong > > I have done some more testing of the problem, > continuing with > src/ksp/ksp/examples/tutorials/ex2.c. > > The behavior I am seeing is that with smaller > problems sizes superlu_dist > is > behaving properly > but with larger problem sizes things seem to go > wrong and what goes wrong > is > apparently consistent; the error appears both with > my intel build as well > as > with my gcc build. > > I have two run lines: > > runex2superlu: > -@${MPIEXEC} -n 2 ./ex2 > -ksp_monitor_short -m 100 -n 100 > -ksp_type > preonly -pc_type lu -pc_factor_mat_solver_package > superlu_dist > > runex2spooles: > -@${MPIEXEC} -n 2 ./ex2 > -ksp_monitor_short -m 100 -n 100 > -ksp_type > preonly -pc_type lu -pc_factor_mat_solver_package > spooles > > From my intel build, I get > > sg-macbook-prolocal:tutorials sg$ make runex2superlu > Norm of error 7.66145e-13 iterations 1 > sg-macbook-prolocal:tutorials sg$ make runex2spooles > Norm of error 2.21422e-12 iterations 1 > > From my GCC build, I get > sg-macbook-prolocal:tutorials sg$ make runex2superlu > Norm of error 7.66145e-13 iterations 1 > sg-macbook-prolocal:tutorials sg$ make runex2spooles > Norm of error 2.21422e-12 iterations 1 > > If I change the -m 100 -n 100 to -m 500 -n 500, I > get for my intel build > > sg-macbook-prolocal:tutorials sg$ make runex2superlu > Norm of error 419.953 iterations 1 > sg-macbook-prolocal:tutorials sg$ make runex2spooles > Norm of error 2.69468e-10 iterations 1 > > From my GCC build with -m 500 -n 500, I get > > sg-macbook-prolocal:tutorials sg$ make runex2superlu > Norm of error 419.953 iterations 1 > sg-macbook-prolocal:tutorials sg$ make runex2spooles > Norm of error 2.69468e-10 iterations 1 > > > Any suggestions will be greatly appreciated. > > -sanjay > > > > > > > > On 12/23/12 6:42 PM, Matthew Knepley wrote: > > > On Sun, Dec 23, 2012 at 9:37 PM, Sanjay Govindjee > > > wrote: > > I decided to go with > ksp/ksp/exampeles/tutorials/ex2.c; I was > unsure how > to convert the run lines for > snes/examples/ex5.c to work with a direct > solver as I am not versed in SNES options. > > Notwithstanding something strange is happening > only on select examples. > With ksp/ksp/exampeles/tutorials/ex2.c and the > run line: > > -@${MPIEXEC} -n 2 ./ex2 -ksp_monitor_short -m > 20 -n 20 -ksp_type preonly > -pc_type lu -pc_factor_mat_solver_package > superlu_dist > > I get good results (of the order): > > Norm of error 1.85464e-14 iterations 1 > > using both superlu_dist and spooles. > > My BLAS/LAPACK: -llapack -lblas (so native to > my machine). > > If you can guide me on a run line for the snes > ex5.c I can try that too. > I'll also try to construct a GCC build later > to see if that is an issue. > > > Same line on ex5, but ex2 is good enough. However, > it will not tell us > anything new. Try another build. > > Matt > > -sanjay > > > On 12/23/12 5:58 PM, Matthew Knepley wrote: > > On Sun, Dec 23, 2012 at 8:08 PM, Sanjay > Govindjee > > wrote: > > Not sure what you mean by where is your > matrix? I am simply running > ex6 > in the ksp/examples/tests directory. > > The reason I ran this test is because I > was seeing the same behavior > with > my finite element code (on perfectly > benign problems). > > Is there a built-in test that you use to > check that superlu_dist is > working properly with petsc? > i.e. something you know that works with > with petsc 3.3-p5? > > > 1) Run it on a SNES ex5 (or KSP ex2), which is > a nice Laplacian > > 2) Compare with MUMPS > > Matt > > -sanjay > > > > On 12/23/12 4:56 PM, Jed Brown wrote: > > Where is your matrix? It might be ending > up with a very bad pivot. If > the > problem can be reproduced, it should be > reported to the SuperLU_DIST > developers to fix. (Note that we do not > see this with other matrices.) > You > can also try MUMPS. > > > On Sun, Dec 23, 2012 at 6:48 PM, Sanjay > Govindjee > > wrote: > > I wanted to use SuperLU Dist to > perform a direct solve but seem to be > encountering > a problem. I was wonder if this is a > know issue and if there is a > solution for it. > > The problem is easily observed using > ex6.c in > src/ksp/ksp/examples/tests. > > Out of the box: make runex6 produces a > residual error of O(1e-11), all > is well. > > I then changed the run to run on two > processors and add the flag > -pc_factor_mat_solver_package spooles > this produces a residual error > of > O(1e-11), all is still well. > > I then switch over to > -pc_factor_mat_solver_package > superlu_dist and > the > residual error comes back as 22.6637! > Something seems very wrong. > > My build is perfectly vanilla: > > export PETSC_DIR=/Users/sg/petsc-3.3-p5/ > export PETSC_ARCH=intel > > ./configure --with-cc=icc > --with-fc=ifort \ > > -download-{spooles,parmetis,superlu_dist,prometheus,mpich,ml,hypre,metis} > > make PETSC_DIR=/Users/sg/petsc-3.3-p5/ > PETSC_ARCH=intel all > make PETSC_DIR=/Users/sg/petsc-3.3-p5/ > PETSC_ARCH=intel test > > -sanjay > > > > -- > ----------------------------------------------- > Sanjay Govindjee, PhD, PE > Professor of Civil Engineering > Vice Chair for Academic Affairs > > 779 Davis Hall > Structural Engineering, Mechanics and > Materials > Department of Civil Engineering > University of California > Berkeley, CA 94720-1710 > > Voice: +1 510 642 6060 > > FAX: +1 510 643 5264 > > s_g at berkeley.edu > http://www.ce.berkeley.edu/~sanjay > > ----------------------------------------------- > > New Books: > > Engineering Mechanics of Deformable > Solids: A Presentation with Exercises > > > http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641 > http://ukcatalogue.oup.com/product/9780199651641.do > http://amzn.com/0199651647 > > > Engineering Mechanics 3 (Dynamics) > http://www.springer.com/materials/mechanics/book/978-3-642-14018-1 > http://amzn.com/3642140181 > > ----------------------------------------------- > > > > > -- > What most experimenters take for granted > before they begin their > experiments is infinitely more interesting > than any results to which > their > experiments lead. > -- Norbert Wiener > > > > -- > What most experimenters take for granted before > they begin their > experiments > is infinitely more interesting than any results to > which their > experiments > lead. > -- Norbert Wiener > > > -- > ----------------------------------------------- > Sanjay Govindjee, PhD, PE > Professor of Civil Engineering > Vice Chair for Academic Affairs > > 779 Davis Hall > Structural Engineering, Mechanics and Materials > Department of Civil Engineering > University of California > Berkeley, CA 94720-1710 > > Voice: +1 510 642 6060 > FAX: +1 510 643 5264 > s_g at berkeley.edu > http://www.ce.berkeley.edu/~sanjay > > ----------------------------------------------- > > New Books: > > Engineering Mechanics of Deformable > Solids: A Presentation with Exercises > http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641 > http://ukcatalogue.oup.com/product/9780199651641.do > http://amzn.com/0199651647 > > > Engineering Mechanics 3 (Dynamics) > http://www.springer.com/materials/mechanics/book/978-3-642-14018-1 > http://amzn.com/3642140181 > > ----------------------------------------------- > > > -- > ----------------------------------------------- > Sanjay Govindjee, PhD, PE > Professor of Civil Engineering > Vice Chair for Academic Affairs > > 779 Davis Hall > Structural Engineering, Mechanics and Materials > Department of Civil Engineering > University of California > Berkeley, CA 94720-1710 > > Voice: +1 510 642 6060 > FAX: +1 510 643 5264 > s_g at berkeley.edu > http://www.ce.berkeley.edu/~sanjay > > ----------------------------------------------- > > New Books: > > Engineering Mechanics of Deformable > Solids: A Presentation with Exercises > http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641 > http://ukcatalogue.oup.com/product/9780199651641.do > http://amzn.com/0199651647 > > > Engineering Mechanics 3 (Dynamics) > http://www.springer.com/materials/mechanics/book/978-3-642-14018-1 > http://amzn.com/3642140181 > > ----------------------------------------------- > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener -- ----------------------------------------------- Sanjay Govindjee, PhD, PE Professor of Civil Engineering Vice Chair for Academic Affairs 779 Davis Hall Structural Engineering, Mechanics and Materials Department of Civil Engineering University of California Berkeley, CA 94720-1710 Voice: +1 510 642 6060 FAX: +1 510 643 5264 s_g at berkeley.edu http://www.ce.berkeley.edu/~sanjay ----------------------------------------------- New Books: Engineering Mechanics of Deformable Solids: A Presentation with Exercises http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641 http://ukcatalogue.oup.com/product/9780199651641.do http://amzn.com/0199651647 Engineering Mechanics 3 (Dynamics) http://www.springer.com/materials/mechanics/book/978-3-642-14018-1 http://amzn.com/3642140181 ----------------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: From s_g at berkeley.edu Wed Dec 26 19:34:56 2012 From: s_g at berkeley.edu (Sanjay Govindjee) Date: Wed, 26 Dec 2012 17:34:56 -0800 Subject: [petsc-users] Using superlu_dist in a direct solve In-Reply-To: <50DBA349.7030307@berkeley.edu> References: <50D7A664.6080802@berkeley.edu> <50D7AB15.5040606@berkeley.edu> <50D7BFF3.3030909@berkeley.edu> <50DB6892.5040402@berkeley.edu> <50DB6C05.4090006@berkeley.edu> <50DB6E50.3050001@berkeley.edu> <50DBA349.7030307@berkeley.edu> Message-ID: <50DBA5C0.4050807@berkeley.edu> For what it is worth. I ran the problems with valgrind (before I built the --with-shared-libraries version). With spooles the run is essentially clean. With superlu I see lots of errors of the type: ==91099== Syscall param writev(vector[...]) points to uninitialised byte(s) ==91099== at 0x1245FF2: writev (in /usr/lib/system/libsystem_kernel.dylib) ==91099== by 0x101209846: MPIDU_Sock_writev (in ./ex2) ==91099== by 0x101A2BA23: ??? ==91099== by 0x1FFFFFFFB: ??? ==91099== by 0x101A2BA0F: ??? ==91099== by 0x10852053F: ??? ==91099== by 0x101A24907: ??? ==91099== by 0x7FFF5FBFE2DF: ??? ==91099== by 0x1: ??? ==91099== by 0x10120AF13: MPIDI_CH3_iSendv (in ./ex2) ==91099== Address 0x10712d0c8 is 136 bytes inside a block of size 1,661,792 alloc'd ==91099== at 0xC713: malloc (vg_replace_malloc.c:271) ==91099== by 0x100D5C6DF: superlu_malloc_dist (in ./ex2) ==91099== by 0x100D23375: doubleMalloc_dist (in ./ex2) ==91099== by 0x100D415C1: pdgstrs (in ./ex2) ==91099== by 0x100D3F852: pdgssvx (in ./ex2) ==91099== by 0x1007E5D38: MatSolve_SuperLU_DIST (in ./ex2) ==91099== by 0x1002BDA1E: MatSolve (in ./ex2) ==91099== by 0x1009EAF55: PCApply_LU (in ./ex2) ==91099== by 0x100AAE053: PCApply (in ./ex2) ==91099== by 0x100B1BCEE: KSPSolve_PREONLY (in ./ex2) ==91099== by 0x100B54F55: KSPSolve (in ./ex2) ==91099== by 0x1000022FC: main (in ./ex2) On 12/26/12 5:24 PM, Sanjay Govindjee wrote: > I have re-configured/built using: > > ./configure PETSC_ARCH=gnu_shared > -download-{spooles,parmetis,superlu_dist,prometheus,mpich,ml,hypre,metis} > --with-shared-libraries > > make PETSC_ARCH=gnu_shared all > > make PETSC_ARCH=gnu_shared test > > > Using the same test problem (src/ksp/ksp/examples/tutorials/ex2.c), on > the 100x100 case I get: > > sg-macbook-prolocal:tutorials sg$ make PETSC_ARCH=gnu_shared runex2spooles > Norm of error 2.21422e-12 iterations 1 > sg-macbook-prolocal:tutorials sg$ make PETSC_ARCH=gnu_shared runex2superlu > Norm of error 7.66145e-13 iterations 1 > > One the 500x500 case I get: > > sg-macbook-prolocal:tutorials sg$ make PETSC_ARCH=gnu_shared runex2spooles > Norm of error 2.69468e-10 iterations 1 > sg-macbook-prolocal:tutorials sg$ make PETSC_ARCH=gnu_shared runex2superlu > Norm of error 419.953 iterations 1 > > otool shows: > > sg-macbook-prolocal:tutorials sg$ otool -L ex2 > ex2: > /Users/sg/petsc-3.3-p5/gnu_shared/lib/libpetsc.dylib > (compatibility version 0.0.0, current version 0.0.0) > /usr/X11/lib/libX11.6.dylib (compatibility version 10.0.0, current > version 10.0.0) > /Users/sg/petsc-3.3-p5/gnu_shared/lib/libmpichcxx.dylib > (compatibility version 0.0.0, current version 3.0.0) > /usr/local/lib/libstdc++.6.dylib (compatibility version 7.0.0, > current version 7.17.0) > /Users/sg/petsc-3.3-p5/gnu_shared/lib/libparmetis.dylib > (compatibility version 0.0.0, current version 0.0.0) > /Users/sg/petsc-3.3-p5/gnu_shared/lib/libmetis.dylib > (compatibility version 0.0.0, current version 0.0.0) > /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libLAPACK.dylib > (compatibility version 1.0.0, current version 1.0.0) > /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib > (compatibility version 1.0.0, current version 1.0.0) > /Users/sg/petsc-3.3-p5/gnu_shared/lib/libmpichf90.dylib > (compatibility version 0.0.0, current version 3.0.0) > /usr/local/lib/libgfortran.3.dylib (compatibility version 4.0.0, > current version 4.0.0) > /usr/local/lib/libquadmath.0.dylib (compatibility version 1.0.0, > current version 1.0.0) > /Users/sg/petsc-3.3-p5/gnu_shared/lib/libpmpich.dylib > (compatibility version 0.0.0, current version 3.0.0) > /Users/sg/petsc-3.3-p5/gnu_shared/lib/libmpich.dylib > (compatibility version 0.0.0, current version 3.0.0) > /Users/sg/petsc-3.3-p5/gnu_shared/lib/libopa.1.dylib > (compatibility version 2.0.0, current version 2.0.0) > /Users/sg/petsc-3.3-p5/gnu_shared/lib/libmpl.1.dylib > (compatibility version 3.0.0, current version 3.0.0) > /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current > version 159.1.0) > /usr/local/lib/libgcc_s.1.dylib (compatibility version 1.0.0, > current version 1.0.0) > > > > > On 12/26/12 3:08 PM, Matthew Knepley wrote: >> >> On Wed, Dec 26, 2012 at 4:38 PM, Sanjay Govindjee > > wrote: >> >> I have a macbook pro (Mac OS X 10.7.5) >> >> % uname -a >> Darwin sg-macbook-prolocal.local 11.4.2 Darwin Kernel Version >> 11.4.2: Thu Aug 23 16:25:48 PDT 2012; >> root:xnu-1699.32.7~1/RELEASE_X86_64 x86_64 >> >> I configured using: >> >> >> ./configure --with-cc=icc --with-fc=ifort >> -download-{spooles,parmetis,superlu_dist,prometheus,mpich,ml,hypre,metis} >> >> so everything was built together. >> >> >> Since >> >> a) you have tried other compilers >> >> b) we cannot reproduce it >> >> c) we are building the library during configure >> >> I would guess that some outside library, in your default link path, >> is contaminating >> the executable with symbols which override some of those in SuperLU. >> The SuperLU >> people are not super careful about naming. Could you >> >> 1) Try this same exercise using --with-shared-libraries >> >> 2) Once you do that, use otool -L on the executable so we can see >> where everything comes from >> >> Thanks, >> >> Matt >> >> On 12/26/12 1:34 PM, Hong Zhang wrote: >> >> Sanjay: >> >> hmmm....I guess that is good news -- in that superlu is >> not broken. However, >> for me >> not so good news since I seems that there is nasty bug >> lurking on my >> machine. >> >> Any suggestions on chasing down the error? >> >> How did you install your supelu_dist with petsc-3.3? >> What machine do you use? >> >> Hong >> >> >> On 12/26/12 1:23 PM, Hong Zhang wrote: >> >> Sanjay: >> I get >> petsc-3.3/src/ksp/ksp/examples/tutorials>mpiexec -n 2 >> ./ex2 >> -ksp_monitor_short -ksp_type preonly -pc_type lu >> -pc_factor_mat_solver_package superlu_dist -m 500 -n 500 >> Norm of error 1.92279e-11 iterations 1 >> >> Hong >> >> I have done some more testing of the problem, >> continuing with >> src/ksp/ksp/examples/tutorials/ex2.c. >> >> The behavior I am seeing is that with smaller >> problems sizes superlu_dist >> is >> behaving properly >> but with larger problem sizes things seem to go >> wrong and what goes wrong >> is >> apparently consistent; the error appears both >> with my intel build as well >> as >> with my gcc build. >> >> I have two run lines: >> >> runex2superlu: >> -@${MPIEXEC} -n 2 ./ex2 >> -ksp_monitor_short -m 100 -n 100 >> -ksp_type >> preonly -pc_type lu -pc_factor_mat_solver_package >> superlu_dist >> >> runex2spooles: >> -@${MPIEXEC} -n 2 ./ex2 >> -ksp_monitor_short -m 100 -n 100 >> -ksp_type >> preonly -pc_type lu -pc_factor_mat_solver_package >> spooles >> >> From my intel build, I get >> >> sg-macbook-prolocal:tutorials sg$ make runex2superlu >> Norm of error 7.66145e-13 iterations 1 >> sg-macbook-prolocal:tutorials sg$ make runex2spooles >> Norm of error 2.21422e-12 iterations 1 >> >> From my GCC build, I get >> sg-macbook-prolocal:tutorials sg$ make runex2superlu >> Norm of error 7.66145e-13 iterations 1 >> sg-macbook-prolocal:tutorials sg$ make runex2spooles >> Norm of error 2.21422e-12 iterations 1 >> >> If I change the -m 100 -n 100 to -m 500 -n 500, I >> get for my intel build >> >> sg-macbook-prolocal:tutorials sg$ make runex2superlu >> Norm of error 419.953 iterations 1 >> sg-macbook-prolocal:tutorials sg$ make runex2spooles >> Norm of error 2.69468e-10 iterations 1 >> >> From my GCC build with -m 500 -n 500, I get >> >> sg-macbook-prolocal:tutorials sg$ make runex2superlu >> Norm of error 419.953 iterations 1 >> sg-macbook-prolocal:tutorials sg$ make runex2spooles >> Norm of error 2.69468e-10 iterations 1 >> >> >> Any suggestions will be greatly appreciated. >> >> -sanjay >> >> >> >> >> >> >> >> On 12/23/12 6:42 PM, Matthew Knepley wrote: >> >> >> On Sun, Dec 23, 2012 at 9:37 PM, Sanjay Govindjee >> > >> wrote: >> >> I decided to go with >> ksp/ksp/exampeles/tutorials/ex2.c; I was >> unsure how >> to convert the run lines for >> snes/examples/ex5.c to work with a direct >> solver as I am not versed in SNES options. >> >> Notwithstanding something strange is >> happening only on select examples. >> With ksp/ksp/exampeles/tutorials/ex2.c and >> the run line: >> >> -@${MPIEXEC} -n 2 ./ex2 -ksp_monitor_short -m >> 20 -n 20 -ksp_type preonly >> -pc_type lu -pc_factor_mat_solver_package >> superlu_dist >> >> I get good results (of the order): >> >> Norm of error 1.85464e-14 iterations 1 >> >> using both superlu_dist and spooles. >> >> My BLAS/LAPACK: -llapack -lblas (so native to >> my machine). >> >> If you can guide me on a run line for the >> snes ex5.c I can try that too. >> I'll also try to construct a GCC build later >> to see if that is an issue. >> >> >> Same line on ex5, but ex2 is good enough. >> However, it will not tell us >> anything new. Try another build. >> >> Matt >> >> -sanjay >> >> >> On 12/23/12 5:58 PM, Matthew Knepley wrote: >> >> On Sun, Dec 23, 2012 at 8:08 PM, Sanjay >> Govindjee > > >> wrote: >> >> Not sure what you mean by where is your >> matrix? I am simply running >> ex6 >> in the ksp/examples/tests directory. >> >> The reason I ran this test is because I >> was seeing the same behavior >> with >> my finite element code (on perfectly >> benign problems). >> >> Is there a built-in test that you use to >> check that superlu_dist is >> working properly with petsc? >> i.e. something you know that works with >> with petsc 3.3-p5? >> >> >> 1) Run it on a SNES ex5 (or KSP ex2), which >> is a nice Laplacian >> >> 2) Compare with MUMPS >> >> Matt >> >> -sanjay >> >> >> >> On 12/23/12 4:56 PM, Jed Brown wrote: >> >> Where is your matrix? It might be ending >> up with a very bad pivot. If >> the >> problem can be reproduced, it should be >> reported to the SuperLU_DIST >> developers to fix. (Note that we do not >> see this with other matrices.) >> You >> can also try MUMPS. >> >> >> On Sun, Dec 23, 2012 at 6:48 PM, Sanjay >> Govindjee > > >> wrote: >> >> I wanted to use SuperLU Dist to >> perform a direct solve but seem to be >> encountering >> a problem. I was wonder if this is a >> know issue and if there is a >> solution for it. >> >> The problem is easily observed using >> ex6.c in >> src/ksp/ksp/examples/tests. >> >> Out of the box: make runex6 produces >> a residual error of O(1e-11), all >> is well. >> >> I then changed the run to run on two >> processors and add the flag >> -pc_factor_mat_solver_package spooles >> this produces a residual error >> of >> O(1e-11), all is still well. >> >> I then switch over to >> -pc_factor_mat_solver_package >> superlu_dist and >> the >> residual error comes back as 22.6637! >> Something seems very wrong. >> >> My build is perfectly vanilla: >> >> export PETSC_DIR=/Users/sg/petsc-3.3-p5/ >> export PETSC_ARCH=intel >> >> ./configure --with-cc=icc >> --with-fc=ifort \ >> >> -download-{spooles,parmetis,superlu_dist,prometheus,mpich,ml,hypre,metis} >> >> make >> PETSC_DIR=/Users/sg/petsc-3.3-p5/ >> PETSC_ARCH=intel all >> make >> PETSC_DIR=/Users/sg/petsc-3.3-p5/ >> PETSC_ARCH=intel test >> >> -sanjay >> >> >> >> -- >> ----------------------------------------------- >> Sanjay Govindjee, PhD, PE >> Professor of Civil Engineering >> Vice Chair for Academic Affairs >> >> 779 Davis Hall >> Structural Engineering, Mechanics and >> Materials >> Department of Civil Engineering >> University of California >> Berkeley, CA 94720-1710 >> >> Voice: +1 510 642 6060 >> >> FAX: +1 510 643 5264 >> >> s_g at berkeley.edu >> http://www.ce.berkeley.edu/~sanjay >> >> ----------------------------------------------- >> >> New Books: >> >> Engineering Mechanics of Deformable >> Solids: A Presentation with Exercises >> >> >> http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641 >> http://ukcatalogue.oup.com/product/9780199651641.do >> http://amzn.com/0199651647 >> >> >> Engineering Mechanics 3 (Dynamics) >> http://www.springer.com/materials/mechanics/book/978-3-642-14018-1 >> http://amzn.com/3642140181 >> >> ----------------------------------------------- >> >> >> >> >> -- >> What most experimenters take for granted >> before they begin their >> experiments is infinitely more interesting >> than any results to which >> their >> experiments lead. >> -- Norbert Wiener >> >> >> >> -- >> What most experimenters take for granted before >> they begin their >> experiments >> is infinitely more interesting than any results >> to which their >> experiments >> lead. >> -- Norbert Wiener >> >> >> -- >> ----------------------------------------------- >> Sanjay Govindjee, PhD, PE >> Professor of Civil Engineering >> Vice Chair for Academic Affairs >> >> 779 Davis Hall >> Structural Engineering, Mechanics and Materials >> Department of Civil Engineering >> University of California >> Berkeley, CA 94720-1710 >> >> Voice: +1 510 642 6060 >> FAX: +1 510 643 5264 >> s_g at berkeley.edu >> http://www.ce.berkeley.edu/~sanjay >> >> ----------------------------------------------- >> >> New Books: >> >> Engineering Mechanics of Deformable >> Solids: A Presentation with Exercises >> http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641 >> http://ukcatalogue.oup.com/product/9780199651641.do >> http://amzn.com/0199651647 >> >> >> Engineering Mechanics 3 (Dynamics) >> http://www.springer.com/materials/mechanics/book/978-3-642-14018-1 >> http://amzn.com/3642140181 >> >> ----------------------------------------------- >> >> >> -- >> ----------------------------------------------- >> Sanjay Govindjee, PhD, PE >> Professor of Civil Engineering >> Vice Chair for Academic Affairs >> >> 779 Davis Hall >> Structural Engineering, Mechanics and Materials >> Department of Civil Engineering >> University of California >> Berkeley, CA 94720-1710 >> >> Voice: +1 510 642 6060 >> FAX: +1 510 643 5264 >> s_g at berkeley.edu >> http://www.ce.berkeley.edu/~sanjay >> >> ----------------------------------------------- >> >> New Books: >> >> Engineering Mechanics of Deformable >> Solids: A Presentation with Exercises >> http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641 >> http://ukcatalogue.oup.com/product/9780199651641.do >> http://amzn.com/0199651647 >> >> >> Engineering Mechanics 3 (Dynamics) >> http://www.springer.com/materials/mechanics/book/978-3-642-14018-1 >> http://amzn.com/3642140181 >> >> ----------------------------------------------- >> >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which >> their experiments lead. >> -- Norbert Wiener > > -- > ----------------------------------------------- > Sanjay Govindjee, PhD, PE > Professor of Civil Engineering > Vice Chair for Academic Affairs > > 779 Davis Hall > Structural Engineering, Mechanics and Materials > Department of Civil Engineering > University of California > Berkeley, CA 94720-1710 > > Voice: +1 510 642 6060 > FAX: +1 510 643 5264 > s_g at berkeley.edu > http://www.ce.berkeley.edu/~sanjay > ----------------------------------------------- > > New Books: > > Engineering Mechanics of Deformable > Solids: A Presentation with Exercises > http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641 > http://ukcatalogue.oup.com/product/9780199651641.do > http://amzn.com/0199651647 > > > Engineering Mechanics 3 (Dynamics) > http://www.springer.com/materials/mechanics/book/978-3-642-14018-1 > http://amzn.com/3642140181 > > ----------------------------------------------- -- ----------------------------------------------- Sanjay Govindjee, PhD, PE Professor of Civil Engineering Vice Chair for Academic Affairs 779 Davis Hall Structural Engineering, Mechanics and Materials Department of Civil Engineering University of California Berkeley, CA 94720-1710 Voice: +1 510 642 6060 FAX: +1 510 643 5264 s_g at berkeley.edu http://www.ce.berkeley.edu/~sanjay ----------------------------------------------- New Books: Engineering Mechanics of Deformable Solids: A Presentation with Exercises http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641 http://ukcatalogue.oup.com/product/9780199651641.do http://amzn.com/0199651647 Engineering Mechanics 3 (Dynamics) http://www.springer.com/materials/mechanics/book/978-3-642-14018-1 http://amzn.com/3642140181 ----------------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Dec 26 20:46:59 2012 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 26 Dec 2012 21:46:59 -0500 Subject: [petsc-users] Using superlu_dist in a direct solve In-Reply-To: <50DBA5C0.4050807@berkeley.edu> References: <50D7A664.6080802@berkeley.edu> <50D7AB15.5040606@berkeley.edu> <50D7BFF3.3030909@berkeley.edu> <50DB6892.5040402@berkeley.edu> <50DB6C05.4090006@berkeley.edu> <50DB6E50.3050001@berkeley.edu> <50DBA349.7030307@berkeley.edu> <50DBA5C0.4050807@berkeley.edu> Message-ID: On Wed, Dec 26, 2012 at 8:34 PM, Sanjay Govindjee wrote: > For what it is worth. I ran the problems with valgrind (before I built > the --with-shared-libraries version). > With spooles the run is essentially clean. With superlu I see lots of > errors of the type: > This looks like a well-known MPICH problem with valgrind reporting. However, these stacks look strange. You should have source line numbers if this is compiled with debugging and you should have the whole stack for MPICH. Also, why is libquadmath being linked? Matt > ==91099== Syscall param writev(vector[...]) points to uninitialised byte(s) > ==91099== at 0x1245FF2: writev (in > /usr/lib/system/libsystem_kernel.dylib) > ==91099== by 0x101209846: MPIDU_Sock_writev (in ./ex2) > ==91099== by 0x101A2BA23: ??? > ==91099== by 0x1FFFFFFFB: ??? > ==91099== by 0x101A2BA0F: ??? > ==91099== by 0x10852053F: ??? > ==91099== by 0x101A24907: ??? > ==91099== by 0x7FFF5FBFE2DF: ??? > ==91099== by 0x1: ??? > ==91099== by 0x10120AF13: MPIDI_CH3_iSendv (in ./ex2) > ==91099== Address 0x10712d0c8 is 136 bytes inside a block of size > 1,661,792 alloc'd > ==91099== at 0xC713: malloc (vg_replace_malloc.c:271) > ==91099== by 0x100D5C6DF: superlu_malloc_dist (in ./ex2) > ==91099== by 0x100D23375: doubleMalloc_dist (in ./ex2) > ==91099== by 0x100D415C1: pdgstrs (in ./ex2) > ==91099== by 0x100D3F852: pdgssvx (in ./ex2) > ==91099== by 0x1007E5D38: MatSolve_SuperLU_DIST (in ./ex2) > ==91099== by 0x1002BDA1E: MatSolve (in ./ex2) > ==91099== by 0x1009EAF55: PCApply_LU (in ./ex2) > ==91099== by 0x100AAE053: PCApply (in ./ex2) > ==91099== by 0x100B1BCEE: KSPSolve_PREONLY (in ./ex2) > ==91099== by 0x100B54F55: KSPSolve (in ./ex2) > ==91099== by 0x1000022FC: main (in ./ex2) > > > > On 12/26/12 5:24 PM, Sanjay Govindjee wrote: > > I have re-configured/built using: > > ./configure PETSC_ARCH=gnu_shared > -download-{spooles,parmetis,superlu_dist,prometheus,mpich,ml,hypre,metis} > --with-shared-libraries > > make PETSC_ARCH=gnu_shared all > > make PETSC_ARCH=gnu_shared test > > > Using the same test problem (src/ksp/ksp/examples/tutorials/ex2.c), on the > 100x100 case I get: > > sg-macbook-prolocal:tutorials sg$ make PETSC_ARCH=gnu_shared runex2spooles > Norm of error 2.21422e-12 iterations 1 > sg-macbook-prolocal:tutorials sg$ make PETSC_ARCH=gnu_shared runex2superlu > Norm of error 7.66145e-13 iterations 1 > > One the 500x500 case I get: > > sg-macbook-prolocal:tutorials sg$ make PETSC_ARCH=gnu_shared runex2spooles > Norm of error 2.69468e-10 iterations 1 > sg-macbook-prolocal:tutorials sg$ make PETSC_ARCH=gnu_shared runex2superlu > Norm of error 419.953 iterations 1 > > otool shows: > > sg-macbook-prolocal:tutorials sg$ otool -L ex2 > ex2: > /Users/sg/petsc-3.3-p5/gnu_shared/lib/libpetsc.dylib (compatibility > version 0.0.0, current version 0.0.0) > /usr/X11/lib/libX11.6.dylib (compatibility version 10.0.0, current > version 10.0.0) > /Users/sg/petsc-3.3-p5/gnu_shared/lib/libmpichcxx.dylib (compatibility > version 0.0.0, current version 3.0.0) > /usr/local/lib/libstdc++.6.dylib (compatibility version 7.0.0, current > version 7.17.0) > /Users/sg/petsc-3.3-p5/gnu_shared/lib/libparmetis.dylib (compatibility > version 0.0.0, current version 0.0.0) > /Users/sg/petsc-3.3-p5/gnu_shared/lib/libmetis.dylib (compatibility > version 0.0.0, current version 0.0.0) > > /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libLAPACK.dylib > (compatibility version 1.0.0, current version 1.0.0) > > /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib > (compatibility version 1.0.0, current version 1.0.0) > /Users/sg/petsc-3.3-p5/gnu_shared/lib/libmpichf90.dylib (compatibility > version 0.0.0, current version 3.0.0) > /usr/local/lib/libgfortran.3.dylib (compatibility version 4.0.0, > current version 4.0.0) > /usr/local/lib/libquadmath.0.dylib (compatibility version 1.0.0, > current version 1.0.0) > /Users/sg/petsc-3.3-p5/gnu_shared/lib/libpmpich.dylib (compatibility > version 0.0.0, current version 3.0.0) > /Users/sg/petsc-3.3-p5/gnu_shared/lib/libmpich.dylib (compatibility > version 0.0.0, current version 3.0.0) > /Users/sg/petsc-3.3-p5/gnu_shared/lib/libopa.1.dylib (compatibility > version 2.0.0, current version 2.0.0) > /Users/sg/petsc-3.3-p5/gnu_shared/lib/libmpl.1.dylib (compatibility > version 3.0.0, current version 3.0.0) > /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current > version 159.1.0) > /usr/local/lib/libgcc_s.1.dylib (compatibility version 1.0.0, current > version 1.0.0) > > > > > On 12/26/12 3:08 PM, Matthew Knepley wrote: > > > On Wed, Dec 26, 2012 at 4:38 PM, Sanjay Govindjee wrote: > >> I have a macbook pro (Mac OS X 10.7.5) >> >> % uname -a >> Darwin sg-macbook-prolocal.local 11.4.2 Darwin Kernel Version 11.4.2: Thu >> Aug 23 16:25:48 PDT 2012; root:xnu-1699.32.7~1/RELEASE_X86_64 x86_64 >> >> I configured using: >> >> >> ./configure --with-cc=icc --with-fc=ifort >> -download-{spooles,parmetis,superlu_dist,prometheus,mpich,ml,hypre,metis} >> >> so everything was built together. > > > Since > > a) you have tried other compilers > > b) we cannot reproduce it > > c) we are building the library during configure > > I would guess that some outside library, in your default link path, is > contaminating > the executable with symbols which override some of those in SuperLU. The > SuperLU > people are not super careful about naming. Could you > > 1) Try this same exercise using --with-shared-libraries > > 2) Once you do that, use otool -L on the executable so we can see > where everything comes from > > Thanks, > > Matt > > >> On 12/26/12 1:34 PM, Hong Zhang wrote: >> >>> Sanjay: >>> >>>> hmmm....I guess that is good news -- in that superlu is not broken. >>>> However, >>>> for me >>>> not so good news since I seems that there is nasty bug lurking on my >>>> machine. >>>> >>>> Any suggestions on chasing down the error? >>>> >>> How did you install your supelu_dist with petsc-3.3? >>> What machine do you use? >>> >>> Hong >>> >>>> >>>> On 12/26/12 1:23 PM, Hong Zhang wrote: >>>> >>>>> Sanjay: >>>>> I get >>>>> petsc-3.3/src/ksp/ksp/examples/tutorials>mpiexec -n 2 ./ex2 >>>>> -ksp_monitor_short -ksp_type preonly -pc_type lu >>>>> -pc_factor_mat_solver_package superlu_dist -m 500 -n 500 >>>>> Norm of error 1.92279e-11 iterations 1 >>>>> >>>>> Hong >>>>> >>>>> I have done some more testing of the problem, continuing with >>>>>> src/ksp/ksp/examples/tutorials/ex2.c. >>>>>> >>>>>> The behavior I am seeing is that with smaller problems sizes >>>>>> superlu_dist >>>>>> is >>>>>> behaving properly >>>>>> but with larger problem sizes things seem to go wrong and what goes >>>>>> wrong >>>>>> is >>>>>> apparently consistent; the error appears both with my intel build as >>>>>> well >>>>>> as >>>>>> with my gcc build. >>>>>> >>>>>> I have two run lines: >>>>>> >>>>>> runex2superlu: >>>>>> -@${MPIEXEC} -n 2 ./ex2 -ksp_monitor_short -m 100 -n 100 >>>>>> -ksp_type >>>>>> preonly -pc_type lu -pc_factor_mat_solver_package superlu_dist >>>>>> >>>>>> runex2spooles: >>>>>> -@${MPIEXEC} -n 2 ./ex2 -ksp_monitor_short -m 100 -n 100 >>>>>> -ksp_type >>>>>> preonly -pc_type lu -pc_factor_mat_solver_package spooles >>>>>> >>>>>> From my intel build, I get >>>>>> >>>>>> sg-macbook-prolocal:tutorials sg$ make runex2superlu >>>>>> Norm of error 7.66145e-13 iterations 1 >>>>>> sg-macbook-prolocal:tutorials sg$ make runex2spooles >>>>>> Norm of error 2.21422e-12 iterations 1 >>>>>> >>>>>> From my GCC build, I get >>>>>> sg-macbook-prolocal:tutorials sg$ make runex2superlu >>>>>> Norm of error 7.66145e-13 iterations 1 >>>>>> sg-macbook-prolocal:tutorials sg$ make runex2spooles >>>>>> Norm of error 2.21422e-12 iterations 1 >>>>>> >>>>>> If I change the -m 100 -n 100 to -m 500 -n 500, I get for my intel >>>>>> build >>>>>> >>>>>> sg-macbook-prolocal:tutorials sg$ make runex2superlu >>>>>> Norm of error 419.953 iterations 1 >>>>>> sg-macbook-prolocal:tutorials sg$ make runex2spooles >>>>>> Norm of error 2.69468e-10 iterations 1 >>>>>> >>>>>> From my GCC build with -m 500 -n 500, I get >>>>>> >>>>>> sg-macbook-prolocal:tutorials sg$ make runex2superlu >>>>>> Norm of error 419.953 iterations 1 >>>>>> sg-macbook-prolocal:tutorials sg$ make runex2spooles >>>>>> Norm of error 2.69468e-10 iterations 1 >>>>>> >>>>>> >>>>>> Any suggestions will be greatly appreciated. >>>>>> >>>>>> -sanjay >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On 12/23/12 6:42 PM, Matthew Knepley wrote: >>>>>> >>>>>> >>>>>> On Sun, Dec 23, 2012 at 9:37 PM, Sanjay Govindjee >>>>>> wrote: >>>>>> >>>>>>> I decided to go with ksp/ksp/exampeles/tutorials/ex2.c; I was unsure >>>>>>> how >>>>>>> to convert the run lines for snes/examples/ex5.c to work with a >>>>>>> direct >>>>>>> solver as I am not versed in SNES options. >>>>>>> >>>>>>> Notwithstanding something strange is happening only on select >>>>>>> examples. >>>>>>> With ksp/ksp/exampeles/tutorials/ex2.c and the run line: >>>>>>> >>>>>>> -@${MPIEXEC} -n 2 ./ex2 -ksp_monitor_short -m 20 -n 20 -ksp_type >>>>>>> preonly >>>>>>> -pc_type lu -pc_factor_mat_solver_package superlu_dist >>>>>>> >>>>>>> I get good results (of the order): >>>>>>> >>>>>>> Norm of error 1.85464e-14 iterations 1 >>>>>>> >>>>>>> using both superlu_dist and spooles. >>>>>>> >>>>>>> My BLAS/LAPACK: -llapack -lblas (so native to my machine). >>>>>>> >>>>>>> If you can guide me on a run line for the snes ex5.c I can try that >>>>>>> too. >>>>>>> I'll also try to construct a GCC build later to see if that is an >>>>>>> issue. >>>>>>> >>>>>> >>>>>> Same line on ex5, but ex2 is good enough. However, it will not tell us >>>>>> anything new. Try another build. >>>>>> >>>>>> Matt >>>>>> >>>>>> -sanjay >>>>>>> >>>>>>> >>>>>>> On 12/23/12 5:58 PM, Matthew Knepley wrote: >>>>>>> >>>>>>> On Sun, Dec 23, 2012 at 8:08 PM, Sanjay Govindjee >>>>>>> wrote: >>>>>>> >>>>>>>> Not sure what you mean by where is your matrix? I am simply running >>>>>>>> ex6 >>>>>>>> in the ksp/examples/tests directory. >>>>>>>> >>>>>>>> The reason I ran this test is because I was seeing the same behavior >>>>>>>> with >>>>>>>> my finite element code (on perfectly benign problems). >>>>>>>> >>>>>>>> Is there a built-in test that you use to check that superlu_dist is >>>>>>>> working properly with petsc? >>>>>>>> i.e. something you know that works with with petsc 3.3-p5? >>>>>>>> >>>>>>> >>>>>>> 1) Run it on a SNES ex5 (or KSP ex2), which is a nice Laplacian >>>>>>> >>>>>>> 2) Compare with MUMPS >>>>>>> >>>>>>> Matt >>>>>>> >>>>>>> -sanjay >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 12/23/12 4:56 PM, Jed Brown wrote: >>>>>>>> >>>>>>>> Where is your matrix? It might be ending up with a very bad pivot. >>>>>>>> If >>>>>>>> the >>>>>>>> problem can be reproduced, it should be reported to the SuperLU_DIST >>>>>>>> developers to fix. (Note that we do not see this with other >>>>>>>> matrices.) >>>>>>>> You >>>>>>>> can also try MUMPS. >>>>>>>> >>>>>>>> >>>>>>>> On Sun, Dec 23, 2012 at 6:48 PM, Sanjay Govindjee >>>>>>> > >>>>>>>> wrote: >>>>>>>> >>>>>>>>> I wanted to use SuperLU Dist to perform a direct solve but seem to >>>>>>>>> be >>>>>>>>> encountering >>>>>>>>> a problem. I was wonder if this is a know issue and if there is a >>>>>>>>> solution for it. >>>>>>>>> >>>>>>>>> The problem is easily observed using ex6.c in >>>>>>>>> src/ksp/ksp/examples/tests. >>>>>>>>> >>>>>>>>> Out of the box: make runex6 produces a residual error of O(1e-11), >>>>>>>>> all >>>>>>>>> is well. >>>>>>>>> >>>>>>>>> I then changed the run to run on two processors and add the flag >>>>>>>>> -pc_factor_mat_solver_package spooles this produces a residual >>>>>>>>> error >>>>>>>>> of >>>>>>>>> O(1e-11), all is still well. >>>>>>>>> >>>>>>>>> I then switch over to -pc_factor_mat_solver_package superlu_dist >>>>>>>>> and >>>>>>>>> the >>>>>>>>> residual error comes back as 22.6637! Something seems very wrong. >>>>>>>>> >>>>>>>>> My build is perfectly vanilla: >>>>>>>>> >>>>>>>>> export PETSC_DIR=/Users/sg/petsc-3.3-p5/ >>>>>>>>> export PETSC_ARCH=intel >>>>>>>>> >>>>>>>>> ./configure --with-cc=icc --with-fc=ifort \ >>>>>>>>> >>>>>>>>> >>>>>>>>> -download-{spooles,parmetis,superlu_dist,prometheus,mpich,ml,hypre,metis} >>>>>>>>> >>>>>>>>> make PETSC_DIR=/Users/sg/petsc-3.3-p5/ PETSC_ARCH=intel all >>>>>>>>> make PETSC_DIR=/Users/sg/petsc-3.3-p5/ PETSC_ARCH=intel test >>>>>>>>> >>>>>>>>> -sanjay >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> ----------------------------------------------- >>>>>>>> Sanjay Govindjee, PhD, PE >>>>>>>> Professor of Civil Engineering >>>>>>>> Vice Chair for Academic Affairs >>>>>>>> >>>>>>>> 779 Davis Hall >>>>>>>> Structural Engineering, Mechanics and Materials >>>>>>>> Department of Civil Engineering >>>>>>>> University of California >>>>>>>> Berkeley, CA 94720-1710 >>>>>>>> >>>>>>>> Voice: +1 510 642 6060 <%2B1%20510%20642%206060> >>>>>>>> FAX: +1 510 643 5264 <%2B1%20510%20643%205264> >>>>>>>> s_g at berkeley.edu >>>>>>>> http://www.ce.berkeley.edu/~sanjay >>>>>>>> ----------------------------------------------- >>>>>>>> >>>>>>>> New Books: >>>>>>>> >>>>>>>> Engineering Mechanics of Deformable >>>>>>>> Solids: A Presentation with Exercises >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641 >>>>>>>> http://ukcatalogue.oup.com/product/9780199651641.do >>>>>>>> http://amzn.com/0199651647 >>>>>>>> >>>>>>>> >>>>>>>> Engineering Mechanics 3 (Dynamics) >>>>>>>> http://www.springer.com/materials/mechanics/book/978-3-642-14018-1 >>>>>>>> http://amzn.com/3642140181 >>>>>>>> >>>>>>>> ----------------------------------------------- >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> What most experimenters take for granted before they begin their >>>>>>> experiments is infinitely more interesting than any results to which >>>>>>> their >>>>>>> experiments lead. >>>>>>> -- Norbert Wiener >>>>>>> >>>>>>> >>>>>>> >>>>>> -- >>>>>> What most experimenters take for granted before they begin their >>>>>> experiments >>>>>> is infinitely more interesting than any results to which their >>>>>> experiments >>>>>> lead. >>>>>> -- Norbert Wiener >>>>>> >>>>>> >>>>>> -- >>>> ----------------------------------------------- >>>> Sanjay Govindjee, PhD, PE >>>> Professor of Civil Engineering >>>> Vice Chair for Academic Affairs >>>> >>>> 779 Davis Hall >>>> Structural Engineering, Mechanics and Materials >>>> Department of Civil Engineering >>>> University of California >>>> Berkeley, CA 94720-1710 >>>> >>>> Voice: +1 510 642 6060 <%2B1%20510%20642%206060> >>>> FAX: +1 510 643 5264 <%2B1%20510%20643%205264> >>>> s_g at berkeley.edu >>>> http://www.ce.berkeley.edu/~sanjay >>>> ----------------------------------------------- >>>> >>>> New Books: >>>> >>>> Engineering Mechanics of Deformable >>>> Solids: A Presentation with Exercises >>>> >>>> http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641 >>>> http://ukcatalogue.oup.com/product/9780199651641.do >>>> http://amzn.com/0199651647 >>>> >>>> >>>> Engineering Mechanics 3 (Dynamics) >>>> http://www.springer.com/materials/mechanics/book/978-3-642-14018-1 >>>> http://amzn.com/3642140181 >>>> >>>> ----------------------------------------------- >>>> >>>> >> -- >> ----------------------------------------------- >> Sanjay Govindjee, PhD, PE >> Professor of Civil Engineering >> Vice Chair for Academic Affairs >> >> 779 Davis Hall >> Structural Engineering, Mechanics and Materials >> Department of Civil Engineering >> University of California >> Berkeley, CA 94720-1710 >> >> Voice: +1 510 642 6060 <%2B1%20510%20642%206060> >> FAX: +1 510 643 5264 <%2B1%20510%20643%205264> >> s_g at berkeley.edu >> http://www.ce.berkeley.edu/~sanjay >> ----------------------------------------------- >> >> New Books: >> >> Engineering Mechanics of Deformable >> Solids: A Presentation with Exercises >> >> http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641 >> http://ukcatalogue.oup.com/product/9780199651641.do >> http://amzn.com/0199651647 >> >> >> Engineering Mechanics 3 (Dynamics) >> http://www.springer.com/materials/mechanics/book/978-3-642-14018-1 >> http://amzn.com/3642140181 >> >> ----------------------------------------------- >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > > -- > ----------------------------------------------- > Sanjay Govindjee, PhD, PE > Professor of Civil Engineering > Vice Chair for Academic Affairs > > 779 Davis Hall > Structural Engineering, Mechanics and Materials > Department of Civil Engineering > University of California > Berkeley, CA 94720-1710 > > Voice: +1 510 642 6060 > FAX: +1 510 643 5264s_g at berkeley.eduhttp://www.ce.berkeley.edu/~sanjay > ----------------------------------------------- > > New Books: > > Engineering Mechanics of Deformable > Solids: A Presentation with Exerciseshttp://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641http://ukcatalogue.oup.com/product/9780199651641.dohttp://amzn.com/0199651647 > > > Engineering Mechanics 3 (Dynamics)http://www.springer.com/materials/mechanics/book/978-3-642-14018-1http://amzn.com/3642140181 > > ----------------------------------------------- > > > -- > ----------------------------------------------- > Sanjay Govindjee, PhD, PE > Professor of Civil Engineering > Vice Chair for Academic Affairs > > 779 Davis Hall > Structural Engineering, Mechanics and Materials > Department of Civil Engineering > University of California > Berkeley, CA 94720-1710 > > Voice: +1 510 642 6060 > FAX: +1 510 643 5264s_g at berkeley.eduhttp://www.ce.berkeley.edu/~sanjay > ----------------------------------------------- > > New Books: > > Engineering Mechanics of Deformable > Solids: A Presentation with Exerciseshttp://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641http://ukcatalogue.oup.com/product/9780199651641.dohttp://amzn.com/0199651647 > > > Engineering Mechanics 3 (Dynamics)http://www.springer.com/materials/mechanics/book/978-3-642-14018-1http://amzn.com/3642140181 > > ----------------------------------------------- > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From s_g at berkeley.edu Wed Dec 26 22:52:41 2012 From: s_g at berkeley.edu (Sanjay Govindjee) Date: Wed, 26 Dec 2012 20:52:41 -0800 Subject: [petsc-users] Using superlu_dist in a direct solve In-Reply-To: References: <50D7A664.6080802@berkeley.edu> <50D7AB15.5040606@berkeley.edu> <50D7BFF3.3030909@berkeley.edu> <50DB6892.5040402@berkeley.edu> <50DB6C05.4090006@berkeley.edu> <50DB6E50.3050001@berkeley.edu> <50DBA349.7030307@berkeley.edu> <50DBA5C0.4050807@berkeley.edu> Message-ID: <50DBD419.6070204@berkeley.edu> -g is definitely on. I'll send the configure.log file to the PETSc maintenance e-mail, petsc-maint at mcs.anl.gov . -sanjay On 12/26/12 6:46 PM, Matthew Knepley wrote: > On Wed, Dec 26, 2012 at 8:34 PM, Sanjay Govindjee > wrote: > > For what it is worth. I ran the problems with valgrind (before I > built the --with-shared-libraries version). > With spooles the run is essentially clean. With superlu I see > lots of errors of the type: > > > This looks like a well-known MPICH problem with valgrind reporting. > However, these > stacks look strange. You should have source line numbers if this is > compiled with debugging > and you should have the whole stack for MPICH. > > Also, why is libquadmath being linked? > > Matt > > ==91099== Syscall param writev(vector[...]) points to > uninitialised byte(s) > ==91099== at 0x1245FF2: writev (in > /usr/lib/system/libsystem_kernel.dylib) > ==91099== by 0x101209846: MPIDU_Sock_writev (in ./ex2) > ==91099== by 0x101A2BA23: ??? > ==91099== by 0x1FFFFFFFB: ??? > ==91099== by 0x101A2BA0F: ??? > ==91099== by 0x10852053F: ??? > ==91099== by 0x101A24907: ??? > ==91099== by 0x7FFF5FBFE2DF: ??? > ==91099== by 0x1: ??? > ==91099== by 0x10120AF13: MPIDI_CH3_iSendv (in ./ex2) > ==91099== Address 0x10712d0c8 is 136 bytes inside a block of size > 1,661,792 alloc'd > ==91099== at 0xC713: malloc (vg_replace_malloc.c:271) > ==91099== by 0x100D5C6DF: superlu_malloc_dist (in ./ex2) > ==91099== by 0x100D23375: doubleMalloc_dist (in ./ex2) > ==91099== by 0x100D415C1: pdgstrs (in ./ex2) > ==91099== by 0x100D3F852: pdgssvx (in ./ex2) > ==91099== by 0x1007E5D38: MatSolve_SuperLU_DIST (in ./ex2) > ==91099== by 0x1002BDA1E: MatSolve (in ./ex2) > ==91099== by 0x1009EAF55: PCApply_LU (in ./ex2) > ==91099== by 0x100AAE053: PCApply (in ./ex2) > ==91099== by 0x100B1BCEE: KSPSolve_PREONLY (in ./ex2) > ==91099== by 0x100B54F55: KSPSolve (in ./ex2) > ==91099== by 0x1000022FC: main (in ./ex2) > > > > On 12/26/12 5:24 PM, Sanjay Govindjee wrote: >> I have re-configured/built using: >> >> ./configure PETSC_ARCH=gnu_shared >> -download-{spooles,parmetis,superlu_dist,prometheus,mpich,ml,hypre,metis} >> --with-shared-libraries >> >> make PETSC_ARCH=gnu_shared all >> >> make PETSC_ARCH=gnu_shared test >> >> >> Using the same test problem >> (src/ksp/ksp/examples/tutorials/ex2.c), on the 100x100 case I get: >> >> sg-macbook-prolocal:tutorials sg$ make PETSC_ARCH=gnu_shared >> runex2spooles >> Norm of error 2.21422e-12 iterations 1 >> sg-macbook-prolocal:tutorials sg$ make PETSC_ARCH=gnu_shared >> runex2superlu >> Norm of error 7.66145e-13 iterations 1 >> >> One the 500x500 case I get: >> >> sg-macbook-prolocal:tutorials sg$ make PETSC_ARCH=gnu_shared >> runex2spooles >> Norm of error 2.69468e-10 iterations 1 >> sg-macbook-prolocal:tutorials sg$ make PETSC_ARCH=gnu_shared >> runex2superlu >> Norm of error 419.953 iterations 1 >> >> otool shows: >> >> sg-macbook-prolocal:tutorials sg$ otool -L ex2 >> ex2: >> /Users/sg/petsc-3.3-p5/gnu_shared/lib/libpetsc.dylib >> (compatibility version 0.0.0, current version 0.0.0) >> /usr/X11/lib/libX11.6.dylib (compatibility version 10.0.0, >> current version 10.0.0) >> /Users/sg/petsc-3.3-p5/gnu_shared/lib/libmpichcxx.dylib >> (compatibility version 0.0.0, current version 3.0.0) >> /usr/local/lib/libstdc++.6.dylib (compatibility version >> 7.0.0, current version 7.17.0) >> /Users/sg/petsc-3.3-p5/gnu_shared/lib/libparmetis.dylib >> (compatibility version 0.0.0, current version 0.0.0) >> /Users/sg/petsc-3.3-p5/gnu_shared/lib/libmetis.dylib >> (compatibility version 0.0.0, current version 0.0.0) >> /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libLAPACK.dylib >> (compatibility version 1.0.0, current version 1.0.0) >> /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib >> (compatibility version 1.0.0, current version 1.0.0) >> /Users/sg/petsc-3.3-p5/gnu_shared/lib/libmpichf90.dylib >> (compatibility version 0.0.0, current version 3.0.0) >> /usr/local/lib/libgfortran.3.dylib (compatibility version >> 4.0.0, current version 4.0.0) >> /usr/local/lib/libquadmath.0.dylib (compatibility version >> 1.0.0, current version 1.0.0) >> /Users/sg/petsc-3.3-p5/gnu_shared/lib/libpmpich.dylib >> (compatibility version 0.0.0, current version 3.0.0) >> /Users/sg/petsc-3.3-p5/gnu_shared/lib/libmpich.dylib >> (compatibility version 0.0.0, current version 3.0.0) >> /Users/sg/petsc-3.3-p5/gnu_shared/lib/libopa.1.dylib >> (compatibility version 2.0.0, current version 2.0.0) >> /Users/sg/petsc-3.3-p5/gnu_shared/lib/libmpl.1.dylib >> (compatibility version 3.0.0, current version 3.0.0) >> /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, >> current version 159.1.0) >> /usr/local/lib/libgcc_s.1.dylib (compatibility version 1.0.0, >> current version 1.0.0) >> >> >> >> >> On 12/26/12 3:08 PM, Matthew Knepley wrote: >>> >>> On Wed, Dec 26, 2012 at 4:38 PM, Sanjay Govindjee >>> > wrote: >>> >>> I have a macbook pro (Mac OS X 10.7.5) >>> >>> % uname -a >>> Darwin sg-macbook-prolocal.local 11.4.2 Darwin Kernel >>> Version 11.4.2: Thu Aug 23 16:25:48 PDT 2012; >>> root:xnu-1699.32.7~1/RELEASE_X86_64 x86_64 >>> >>> I configured using: >>> >>> >>> ./configure --with-cc=icc --with-fc=ifort >>> -download-{spooles,parmetis,superlu_dist,prometheus,mpich,ml,hypre,metis} >>> >>> so everything was built together. >>> >>> >>> Since >>> >>> a) you have tried other compilers >>> >>> b) we cannot reproduce it >>> >>> c) we are building the library during configure >>> >>> I would guess that some outside library, in your default link >>> path, is contaminating >>> the executable with symbols which override some of those in >>> SuperLU. The SuperLU >>> people are not super careful about naming. Could you >>> >>> 1) Try this same exercise using --with-shared-libraries >>> >>> 2) Once you do that, use otool -L on the executable so we can >>> see where everything comes from >>> >>> Thanks, >>> >>> Matt >>> >>> On 12/26/12 1:34 PM, Hong Zhang wrote: >>> >>> Sanjay: >>> >>> hmmm....I guess that is good news -- in that superlu >>> is not broken. However, >>> for me >>> not so good news since I seems that there is nasty >>> bug lurking on my >>> machine. >>> >>> Any suggestions on chasing down the error? >>> >>> How did you install your supelu_dist with petsc-3.3? >>> What machine do you use? >>> >>> Hong >>> >>> >>> On 12/26/12 1:23 PM, Hong Zhang wrote: >>> >>> Sanjay: >>> I get >>> petsc-3.3/src/ksp/ksp/examples/tutorials>mpiexec >>> -n 2 ./ex2 >>> -ksp_monitor_short -ksp_type preonly -pc_type lu >>> -pc_factor_mat_solver_package superlu_dist -m >>> 500 -n 500 >>> Norm of error 1.92279e-11 iterations 1 >>> >>> Hong >>> >>> I have done some more testing of the >>> problem, continuing with >>> src/ksp/ksp/examples/tutorials/ex2.c. >>> >>> The behavior I am seeing is that with >>> smaller problems sizes superlu_dist >>> is >>> behaving properly >>> but with larger problem sizes things seem to >>> go wrong and what goes wrong >>> is >>> apparently consistent; the error appears >>> both with my intel build as well >>> as >>> with my gcc build. >>> >>> I have two run lines: >>> >>> runex2superlu: >>> -@${MPIEXEC} -n 2 ./ex2 >>> -ksp_monitor_short -m 100 -n 100 >>> -ksp_type >>> preonly -pc_type lu >>> -pc_factor_mat_solver_package superlu_dist >>> >>> runex2spooles: >>> -@${MPIEXEC} -n 2 ./ex2 >>> -ksp_monitor_short -m 100 -n 100 >>> -ksp_type >>> preonly -pc_type lu >>> -pc_factor_mat_solver_package spooles >>> >>> From my intel build, I get >>> >>> sg-macbook-prolocal:tutorials sg$ make >>> runex2superlu >>> Norm of error 7.66145e-13 iterations 1 >>> sg-macbook-prolocal:tutorials sg$ make >>> runex2spooles >>> Norm of error 2.21422e-12 iterations 1 >>> >>> From my GCC build, I get >>> sg-macbook-prolocal:tutorials sg$ make >>> runex2superlu >>> Norm of error 7.66145e-13 iterations 1 >>> sg-macbook-prolocal:tutorials sg$ make >>> runex2spooles >>> Norm of error 2.21422e-12 iterations 1 >>> >>> If I change the -m 100 -n 100 to -m 500 -n >>> 500, I get for my intel build >>> >>> sg-macbook-prolocal:tutorials sg$ make >>> runex2superlu >>> Norm of error 419.953 iterations 1 >>> sg-macbook-prolocal:tutorials sg$ make >>> runex2spooles >>> Norm of error 2.69468e-10 iterations 1 >>> >>> From my GCC build with -m 500 -n 500, I get >>> >>> sg-macbook-prolocal:tutorials sg$ make >>> runex2superlu >>> Norm of error 419.953 iterations 1 >>> sg-macbook-prolocal:tutorials sg$ make >>> runex2spooles >>> Norm of error 2.69468e-10 iterations 1 >>> >>> >>> Any suggestions will be greatly appreciated. >>> >>> -sanjay >>> >>> >>> >>> >>> >>> >>> >>> On 12/23/12 6:42 PM, Matthew Knepley wrote: >>> >>> >>> On Sun, Dec 23, 2012 at 9:37 PM, Sanjay >>> Govindjee >> > >>> wrote: >>> >>> I decided to go with >>> ksp/ksp/exampeles/tutorials/ex2.c; I was >>> unsure how >>> to convert the run lines for >>> snes/examples/ex5.c to work with a direct >>> solver as I am not versed in SNES options. >>> >>> Notwithstanding something strange is >>> happening only on select examples. >>> With ksp/ksp/exampeles/tutorials/ex2.c >>> and the run line: >>> >>> -@${MPIEXEC} -n 2 ./ex2 >>> -ksp_monitor_short -m 20 -n 20 -ksp_type >>> preonly >>> -pc_type lu >>> -pc_factor_mat_solver_package superlu_dist >>> >>> I get good results (of the order): >>> >>> Norm of error 1.85464e-14 iterations 1 >>> >>> using both superlu_dist and spooles. >>> >>> My BLAS/LAPACK: -llapack -lblas (so >>> native to my machine). >>> >>> If you can guide me on a run line for >>> the snes ex5.c I can try that too. >>> I'll also try to construct a GCC build >>> later to see if that is an issue. >>> >>> >>> Same line on ex5, but ex2 is good enough. >>> However, it will not tell us >>> anything new. Try another build. >>> >>> Matt >>> >>> -sanjay >>> >>> >>> On 12/23/12 5:58 PM, Matthew Knepley wrote: >>> >>> On Sun, Dec 23, 2012 at 8:08 PM, Sanjay >>> Govindjee >> > >>> wrote: >>> >>> Not sure what you mean by where is >>> your matrix? I am simply running >>> ex6 >>> in the ksp/examples/tests directory. >>> >>> The reason I ran this test is >>> because I was seeing the same behavior >>> with >>> my finite element code (on perfectly >>> benign problems). >>> >>> Is there a built-in test that you >>> use to check that superlu_dist is >>> working properly with petsc? >>> i.e. something you know that works >>> with with petsc 3.3-p5? >>> >>> >>> 1) Run it on a SNES ex5 (or KSP ex2), >>> which is a nice Laplacian >>> >>> 2) Compare with MUMPS >>> >>> Matt >>> >>> -sanjay >>> >>> >>> >>> On 12/23/12 4:56 PM, Jed Brown wrote: >>> >>> Where is your matrix? It might be >>> ending up with a very bad pivot. If >>> the >>> problem can be reproduced, it should >>> be reported to the SuperLU_DIST >>> developers to fix. (Note that we do >>> not see this with other matrices.) >>> You >>> can also try MUMPS. >>> >>> >>> On Sun, Dec 23, 2012 at 6:48 PM, >>> Sanjay Govindjee >> > >>> wrote: >>> >>> I wanted to use SuperLU Dist to >>> perform a direct solve but seem >>> to be >>> encountering >>> a problem. I was wonder if this >>> is a know issue and if there is a >>> solution for it. >>> >>> The problem is easily observed >>> using ex6.c in >>> src/ksp/ksp/examples/tests. >>> >>> Out of the box: make runex6 >>> produces a residual error of >>> O(1e-11), all >>> is well. >>> >>> I then changed the run to run on >>> two processors and add the flag >>> -pc_factor_mat_solver_package >>> spooles this produces a >>> residual error >>> of >>> O(1e-11), all is still well. >>> >>> I then switch over to >>> -pc_factor_mat_solver_package >>> superlu_dist and >>> the >>> residual error comes back as >>> 22.6637! Something seems very >>> wrong. >>> >>> My build is perfectly vanilla: >>> >>> export >>> PETSC_DIR=/Users/sg/petsc-3.3-p5/ >>> export PETSC_ARCH=intel >>> >>> ./configure --with-cc=icc >>> --with-fc=ifort \ >>> >>> -download-{spooles,parmetis,superlu_dist,prometheus,mpich,ml,hypre,metis} >>> >>> make >>> PETSC_DIR=/Users/sg/petsc-3.3-p5/ PETSC_ARCH=intel >>> all >>> make >>> PETSC_DIR=/Users/sg/petsc-3.3-p5/ PETSC_ARCH=intel >>> test >>> >>> -sanjay >>> >>> >>> >>> -- >>> ----------------------------------------------- >>> Sanjay Govindjee, PhD, PE >>> Professor of Civil Engineering >>> Vice Chair for Academic Affairs >>> >>> 779 Davis Hall >>> Structural Engineering, Mechanics >>> and Materials >>> Department of Civil Engineering >>> University of California >>> Berkeley, CA 94720-1710 >>> >>> Voice: +1 510 642 6060 >>> >>> FAX: +1 510 643 5264 >>> >>> s_g at berkeley.edu >>> >>> http://www.ce.berkeley.edu/~sanjay >>> >>> ----------------------------------------------- >>> >>> New Books: >>> >>> Engineering Mechanics of Deformable >>> Solids: A Presentation with Exercises >>> >>> >>> http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641 >>> http://ukcatalogue.oup.com/product/9780199651641.do >>> http://amzn.com/0199651647 >>> >>> >>> Engineering Mechanics 3 (Dynamics) >>> http://www.springer.com/materials/mechanics/book/978-3-642-14018-1 >>> http://amzn.com/3642140181 >>> >>> ----------------------------------------------- >>> >>> >>> >>> >>> -- >>> What most experimenters take for granted >>> before they begin their >>> experiments is infinitely more >>> interesting than any results to which >>> their >>> experiments lead. >>> -- Norbert Wiener >>> >>> >>> >>> -- >>> What most experimenters take for granted >>> before they begin their >>> experiments >>> is infinitely more interesting than any >>> results to which their >>> experiments >>> lead. >>> -- Norbert Wiener >>> >>> >>> -- >>> ----------------------------------------------- >>> Sanjay Govindjee, PhD, PE >>> Professor of Civil Engineering >>> Vice Chair for Academic Affairs >>> >>> 779 Davis Hall >>> Structural Engineering, Mechanics and Materials >>> Department of Civil Engineering >>> University of California >>> Berkeley, CA 94720-1710 >>> >>> Voice: +1 510 642 6060 >>> FAX: +1 510 643 5264 >>> s_g at berkeley.edu >>> http://www.ce.berkeley.edu/~sanjay >>> >>> ----------------------------------------------- >>> >>> New Books: >>> >>> Engineering Mechanics of Deformable >>> Solids: A Presentation with Exercises >>> http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641 >>> http://ukcatalogue.oup.com/product/9780199651641.do >>> http://amzn.com/0199651647 >>> >>> >>> Engineering Mechanics 3 (Dynamics) >>> http://www.springer.com/materials/mechanics/book/978-3-642-14018-1 >>> http://amzn.com/3642140181 >>> >>> ----------------------------------------------- >>> >>> >>> -- >>> ----------------------------------------------- >>> Sanjay Govindjee, PhD, PE >>> Professor of Civil Engineering >>> Vice Chair for Academic Affairs >>> >>> 779 Davis Hall >>> Structural Engineering, Mechanics and Materials >>> Department of Civil Engineering >>> University of California >>> Berkeley, CA 94720-1710 >>> >>> Voice: +1 510 642 6060 >>> FAX: +1 510 643 5264 >>> s_g at berkeley.edu >>> http://www.ce.berkeley.edu/~sanjay >>> >>> ----------------------------------------------- >>> >>> New Books: >>> >>> Engineering Mechanics of Deformable >>> Solids: A Presentation with Exercises >>> http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641 >>> http://ukcatalogue.oup.com/product/9780199651641.do >>> http://amzn.com/0199651647 >>> >>> >>> Engineering Mechanics 3 (Dynamics) >>> http://www.springer.com/materials/mechanics/book/978-3-642-14018-1 >>> http://amzn.com/3642140181 >>> >>> ----------------------------------------------- >>> >>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to >>> which their experiments lead. >>> -- Norbert Wiener >> >> -- >> ----------------------------------------------- >> Sanjay Govindjee, PhD, PE >> Professor of Civil Engineering >> Vice Chair for Academic Affairs >> >> 779 Davis Hall >> Structural Engineering, Mechanics and Materials >> Department of Civil Engineering >> University of California >> Berkeley, CA 94720-1710 >> >> Voice:+1 510 642 6060 >> FAX:+1 510 643 5264 >> s_g at berkeley.edu >> http://www.ce.berkeley.edu/~sanjay >> ----------------------------------------------- >> >> New Books: >> >> Engineering Mechanics of Deformable >> Solids: A Presentation with Exercises >> http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641 >> http://ukcatalogue.oup.com/product/9780199651641.do >> http://amzn.com/0199651647 >> >> >> Engineering Mechanics 3 (Dynamics) >> http://www.springer.com/materials/mechanics/book/978-3-642-14018-1 >> http://amzn.com/3642140181 >> >> ----------------------------------------------- > > -- > ----------------------------------------------- > Sanjay Govindjee, PhD, PE > Professor of Civil Engineering > Vice Chair for Academic Affairs > > 779 Davis Hall > Structural Engineering, Mechanics and Materials > Department of Civil Engineering > University of California > Berkeley, CA 94720-1710 > > Voice:+1 510 642 6060 > FAX:+1 510 643 5264 > s_g at berkeley.edu > http://www.ce.berkeley.edu/~sanjay > ----------------------------------------------- > > New Books: > > Engineering Mechanics of Deformable > Solids: A Presentation with Exercises > http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641 > http://ukcatalogue.oup.com/product/9780199651641.do > http://amzn.com/0199651647 > > > Engineering Mechanics 3 (Dynamics) > http://www.springer.com/materials/mechanics/book/978-3-642-14018-1 > http://amzn.com/3642140181 > > ----------------------------------------------- > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener -- ----------------------------------------------- Sanjay Govindjee, PhD, PE Professor of Civil Engineering Vice Chair for Academic Affairs 779 Davis Hall Structural Engineering, Mechanics and Materials Department of Civil Engineering University of California Berkeley, CA 94720-1710 Voice: +1 510 642 6060 FAX: +1 510 643 5264 s_g at berkeley.edu http://www.ce.berkeley.edu/~sanjay ----------------------------------------------- New Books: Engineering Mechanics of Deformable Solids: A Presentation with Exercises http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641 http://ukcatalogue.oup.com/product/9780199651641.do http://amzn.com/0199651647 Engineering Mechanics 3 (Dynamics) http://www.springer.com/materials/mechanics/book/978-3-642-14018-1 http://amzn.com/3642140181 ----------------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: From s_g at berkeley.edu Wed Dec 26 23:00:15 2012 From: s_g at berkeley.edu (Sanjay Govindjee) Date: Wed, 26 Dec 2012 21:00:15 -0800 Subject: [petsc-users] Using superlu_dist in a direct solve In-Reply-To: References: <50D7A664.6080802@berkeley.edu> <50D7AB15.5040606@berkeley.edu> <50D7BFF3.3030909@berkeley.edu> <50DB6892.5040402@berkeley.edu> <50DB6C05.4090006@berkeley.edu> <50DB6E50.3050001@berkeley.edu> <50DBA349.7030307@berkeley.edu> <50DBA5C0.4050807@berkeley.edu> Message-ID: <50DBD5DF.50301@berkeley.edu> fyi, here is what gets printed when I make ex2: sg-macbook-prolocal:tutorials sg$ make ex2 /Users/sg/petsc-3.3-p5/gnu_shared/bin/mpicc -o ex2.o -c -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -g3 -fno-inline -O0 -I/Users/sg/petsc-3.3-p5/include -I/Users/sg/petsc-3.3-p5/gnu_shared/include -D__INSDIR__=src/ksp/ksp/examples/tutorials/ ex2.c /Users/sg/petsc-3.3-p5/gnu_shared/bin/mpicc -Wl,-multiply_defined,suppress -Wl,-multiply_defined -Wl,suppress -Wl,-commons,use_dylibs -Wl,-search_paths_first -Wl,-multiply_defined,suppress -Wl,-multiply_defined -Wl,suppress -Wl,-commons,use_dylibs -Wl,-search_paths_first -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -g3 -fno-inline -O0 -o ex2 ex2.o -L/Users/sg/petsc-3.3-p5//gnu_shared/lib -L/Users/sg/petsc-3.3-p5/gnu_shared/lib -lpetsc -L/usr/X11R6/lib -lX11 -lpromfei -lprometheus -L/usr/local/lib/gcc/x86_64-apple-darwin11.4.0/4.8.0 -L/usr/local/lib -lmpichcxx -lstdc++ -lsuperlu_dist_3.1 -lparmetis -lmetis -lHYPRE -lmpichcxx -lstdc++ -lml -lmpichcxx -lstdc++ -lpthread -lspooles -llapack -lblas -ldl -lmpichf90 -lpthread -lgfortran -lgfortran -lquadmath -lm -lm -lmpichcxx -lstdc++ -lpmpich -lmpich -lopa -lmpl -lSystem -lgcc_ext.10.5 -ldl /bin/rm -f ex2.o On 12/26/12 6:46 PM, Matthew Knepley wrote: > On Wed, Dec 26, 2012 at 8:34 PM, Sanjay Govindjee > wrote: > > For what it is worth. I ran the problems with valgrind (before I > built the --with-shared-libraries version). > With spooles the run is essentially clean. With superlu I see > lots of errors of the type: > > > This looks like a well-known MPICH problem with valgrind reporting. > However, these > stacks look strange. You should have source line numbers if this is > compiled with debugging > and you should have the whole stack for MPICH. > > Also, why is libquadmath being linked? > > Matt > > ==91099== Syscall param writev(vector[...]) points to > uninitialised byte(s) > ==91099== at 0x1245FF2: writev (in > /usr/lib/system/libsystem_kernel.dylib) > ==91099== by 0x101209846: MPIDU_Sock_writev (in ./ex2) > ==91099== by 0x101A2BA23: ??? > ==91099== by 0x1FFFFFFFB: ??? > ==91099== by 0x101A2BA0F: ??? > ==91099== by 0x10852053F: ??? > ==91099== by 0x101A24907: ??? > ==91099== by 0x7FFF5FBFE2DF: ??? > ==91099== by 0x1: ??? > ==91099== by 0x10120AF13: MPIDI_CH3_iSendv (in ./ex2) > ==91099== Address 0x10712d0c8 is 136 bytes inside a block of size > 1,661,792 alloc'd > ==91099== at 0xC713: malloc (vg_replace_malloc.c:271) > ==91099== by 0x100D5C6DF: superlu_malloc_dist (in ./ex2) > ==91099== by 0x100D23375: doubleMalloc_dist (in ./ex2) > ==91099== by 0x100D415C1: pdgstrs (in ./ex2) > ==91099== by 0x100D3F852: pdgssvx (in ./ex2) > ==91099== by 0x1007E5D38: MatSolve_SuperLU_DIST (in ./ex2) > ==91099== by 0x1002BDA1E: MatSolve (in ./ex2) > ==91099== by 0x1009EAF55: PCApply_LU (in ./ex2) > ==91099== by 0x100AAE053: PCApply (in ./ex2) > ==91099== by 0x100B1BCEE: KSPSolve_PREONLY (in ./ex2) > ==91099== by 0x100B54F55: KSPSolve (in ./ex2) > ==91099== by 0x1000022FC: main (in ./ex2) > > > > On 12/26/12 5:24 PM, Sanjay Govindjee wrote: >> I have re-configured/built using: >> >> ./configure PETSC_ARCH=gnu_shared >> -download-{spooles,parmetis,superlu_dist,prometheus,mpich,ml,hypre,metis} >> --with-shared-libraries >> >> make PETSC_ARCH=gnu_shared all >> >> make PETSC_ARCH=gnu_shared test >> >> >> Using the same test problem >> (src/ksp/ksp/examples/tutorials/ex2.c), on the 100x100 case I get: >> >> sg-macbook-prolocal:tutorials sg$ make PETSC_ARCH=gnu_shared >> runex2spooles >> Norm of error 2.21422e-12 iterations 1 >> sg-macbook-prolocal:tutorials sg$ make PETSC_ARCH=gnu_shared >> runex2superlu >> Norm of error 7.66145e-13 iterations 1 >> >> One the 500x500 case I get: >> >> sg-macbook-prolocal:tutorials sg$ make PETSC_ARCH=gnu_shared >> runex2spooles >> Norm of error 2.69468e-10 iterations 1 >> sg-macbook-prolocal:tutorials sg$ make PETSC_ARCH=gnu_shared >> runex2superlu >> Norm of error 419.953 iterations 1 >> >> otool shows: >> >> sg-macbook-prolocal:tutorials sg$ otool -L ex2 >> ex2: >> /Users/sg/petsc-3.3-p5/gnu_shared/lib/libpetsc.dylib >> (compatibility version 0.0.0, current version 0.0.0) >> /usr/X11/lib/libX11.6.dylib (compatibility version 10.0.0, >> current version 10.0.0) >> /Users/sg/petsc-3.3-p5/gnu_shared/lib/libmpichcxx.dylib >> (compatibility version 0.0.0, current version 3.0.0) >> /usr/local/lib/libstdc++.6.dylib (compatibility version >> 7.0.0, current version 7.17.0) >> /Users/sg/petsc-3.3-p5/gnu_shared/lib/libparmetis.dylib >> (compatibility version 0.0.0, current version 0.0.0) >> /Users/sg/petsc-3.3-p5/gnu_shared/lib/libmetis.dylib >> (compatibility version 0.0.0, current version 0.0.0) >> /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libLAPACK.dylib >> (compatibility version 1.0.0, current version 1.0.0) >> /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib >> (compatibility version 1.0.0, current version 1.0.0) >> /Users/sg/petsc-3.3-p5/gnu_shared/lib/libmpichf90.dylib >> (compatibility version 0.0.0, current version 3.0.0) >> /usr/local/lib/libgfortran.3.dylib (compatibility version >> 4.0.0, current version 4.0.0) >> /usr/local/lib/libquadmath.0.dylib (compatibility version >> 1.0.0, current version 1.0.0) >> /Users/sg/petsc-3.3-p5/gnu_shared/lib/libpmpich.dylib >> (compatibility version 0.0.0, current version 3.0.0) >> /Users/sg/petsc-3.3-p5/gnu_shared/lib/libmpich.dylib >> (compatibility version 0.0.0, current version 3.0.0) >> /Users/sg/petsc-3.3-p5/gnu_shared/lib/libopa.1.dylib >> (compatibility version 2.0.0, current version 2.0.0) >> /Users/sg/petsc-3.3-p5/gnu_shared/lib/libmpl.1.dylib >> (compatibility version 3.0.0, current version 3.0.0) >> /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, >> current version 159.1.0) >> /usr/local/lib/libgcc_s.1.dylib (compatibility version 1.0.0, >> current version 1.0.0) >> >> >> >> >> On 12/26/12 3:08 PM, Matthew Knepley wrote: >>> >>> On Wed, Dec 26, 2012 at 4:38 PM, Sanjay Govindjee >>> > wrote: >>> >>> I have a macbook pro (Mac OS X 10.7.5) >>> >>> % uname -a >>> Darwin sg-macbook-prolocal.local 11.4.2 Darwin Kernel >>> Version 11.4.2: Thu Aug 23 16:25:48 PDT 2012; >>> root:xnu-1699.32.7~1/RELEASE_X86_64 x86_64 >>> >>> I configured using: >>> >>> >>> ./configure --with-cc=icc --with-fc=ifort >>> -download-{spooles,parmetis,superlu_dist,prometheus,mpich,ml,hypre,metis} >>> >>> so everything was built together. >>> >>> >>> Since >>> >>> a) you have tried other compilers >>> >>> b) we cannot reproduce it >>> >>> c) we are building the library during configure >>> >>> I would guess that some outside library, in your default link >>> path, is contaminating >>> the executable with symbols which override some of those in >>> SuperLU. The SuperLU >>> people are not super careful about naming. Could you >>> >>> 1) Try this same exercise using --with-shared-libraries >>> >>> 2) Once you do that, use otool -L on the executable so we can >>> see where everything comes from >>> >>> Thanks, >>> >>> Matt >>> >>> On 12/26/12 1:34 PM, Hong Zhang wrote: >>> >>> Sanjay: >>> >>> hmmm....I guess that is good news -- in that superlu >>> is not broken. However, >>> for me >>> not so good news since I seems that there is nasty >>> bug lurking on my >>> machine. >>> >>> Any suggestions on chasing down the error? >>> >>> How did you install your supelu_dist with petsc-3.3? >>> What machine do you use? >>> >>> Hong >>> >>> >>> On 12/26/12 1:23 PM, Hong Zhang wrote: >>> >>> Sanjay: >>> I get >>> petsc-3.3/src/ksp/ksp/examples/tutorials>mpiexec >>> -n 2 ./ex2 >>> -ksp_monitor_short -ksp_type preonly -pc_type lu >>> -pc_factor_mat_solver_package superlu_dist -m >>> 500 -n 500 >>> Norm of error 1.92279e-11 iterations 1 >>> >>> Hong >>> >>> I have done some more testing of the >>> problem, continuing with >>> src/ksp/ksp/examples/tutorials/ex2.c. >>> >>> The behavior I am seeing is that with >>> smaller problems sizes superlu_dist >>> is >>> behaving properly >>> but with larger problem sizes things seem to >>> go wrong and what goes wrong >>> is >>> apparently consistent; the error appears >>> both with my intel build as well >>> as >>> with my gcc build. >>> >>> I have two run lines: >>> >>> runex2superlu: >>> -@${MPIEXEC} -n 2 ./ex2 >>> -ksp_monitor_short -m 100 -n 100 >>> -ksp_type >>> preonly -pc_type lu >>> -pc_factor_mat_solver_package superlu_dist >>> >>> runex2spooles: >>> -@${MPIEXEC} -n 2 ./ex2 >>> -ksp_monitor_short -m 100 -n 100 >>> -ksp_type >>> preonly -pc_type lu >>> -pc_factor_mat_solver_package spooles >>> >>> From my intel build, I get >>> >>> sg-macbook-prolocal:tutorials sg$ make >>> runex2superlu >>> Norm of error 7.66145e-13 iterations 1 >>> sg-macbook-prolocal:tutorials sg$ make >>> runex2spooles >>> Norm of error 2.21422e-12 iterations 1 >>> >>> From my GCC build, I get >>> sg-macbook-prolocal:tutorials sg$ make >>> runex2superlu >>> Norm of error 7.66145e-13 iterations 1 >>> sg-macbook-prolocal:tutorials sg$ make >>> runex2spooles >>> Norm of error 2.21422e-12 iterations 1 >>> >>> If I change the -m 100 -n 100 to -m 500 -n >>> 500, I get for my intel build >>> >>> sg-macbook-prolocal:tutorials sg$ make >>> runex2superlu >>> Norm of error 419.953 iterations 1 >>> sg-macbook-prolocal:tutorials sg$ make >>> runex2spooles >>> Norm of error 2.69468e-10 iterations 1 >>> >>> From my GCC build with -m 500 -n 500, I get >>> >>> sg-macbook-prolocal:tutorials sg$ make >>> runex2superlu >>> Norm of error 419.953 iterations 1 >>> sg-macbook-prolocal:tutorials sg$ make >>> runex2spooles >>> Norm of error 2.69468e-10 iterations 1 >>> >>> >>> Any suggestions will be greatly appreciated. >>> >>> -sanjay >>> >>> >>> >>> >>> >>> >>> >>> On 12/23/12 6:42 PM, Matthew Knepley wrote: >>> >>> >>> On Sun, Dec 23, 2012 at 9:37 PM, Sanjay >>> Govindjee >> > >>> wrote: >>> >>> I decided to go with >>> ksp/ksp/exampeles/tutorials/ex2.c; I was >>> unsure how >>> to convert the run lines for >>> snes/examples/ex5.c to work with a direct >>> solver as I am not versed in SNES options. >>> >>> Notwithstanding something strange is >>> happening only on select examples. >>> With ksp/ksp/exampeles/tutorials/ex2.c >>> and the run line: >>> >>> -@${MPIEXEC} -n 2 ./ex2 >>> -ksp_monitor_short -m 20 -n 20 -ksp_type >>> preonly >>> -pc_type lu >>> -pc_factor_mat_solver_package superlu_dist >>> >>> I get good results (of the order): >>> >>> Norm of error 1.85464e-14 iterations 1 >>> >>> using both superlu_dist and spooles. >>> >>> My BLAS/LAPACK: -llapack -lblas (so >>> native to my machine). >>> >>> If you can guide me on a run line for >>> the snes ex5.c I can try that too. >>> I'll also try to construct a GCC build >>> later to see if that is an issue. >>> >>> >>> Same line on ex5, but ex2 is good enough. >>> However, it will not tell us >>> anything new. Try another build. >>> >>> Matt >>> >>> -sanjay >>> >>> >>> On 12/23/12 5:58 PM, Matthew Knepley wrote: >>> >>> On Sun, Dec 23, 2012 at 8:08 PM, Sanjay >>> Govindjee >> > >>> wrote: >>> >>> Not sure what you mean by where is >>> your matrix? I am simply running >>> ex6 >>> in the ksp/examples/tests directory. >>> >>> The reason I ran this test is >>> because I was seeing the same behavior >>> with >>> my finite element code (on perfectly >>> benign problems). >>> >>> Is there a built-in test that you >>> use to check that superlu_dist is >>> working properly with petsc? >>> i.e. something you know that works >>> with with petsc 3.3-p5? >>> >>> >>> 1) Run it on a SNES ex5 (or KSP ex2), >>> which is a nice Laplacian >>> >>> 2) Compare with MUMPS >>> >>> Matt >>> >>> -sanjay >>> >>> >>> >>> On 12/23/12 4:56 PM, Jed Brown wrote: >>> >>> Where is your matrix? It might be >>> ending up with a very bad pivot. If >>> the >>> problem can be reproduced, it should >>> be reported to the SuperLU_DIST >>> developers to fix. (Note that we do >>> not see this with other matrices.) >>> You >>> can also try MUMPS. >>> >>> >>> On Sun, Dec 23, 2012 at 6:48 PM, >>> Sanjay Govindjee >> > >>> wrote: >>> >>> I wanted to use SuperLU Dist to >>> perform a direct solve but seem >>> to be >>> encountering >>> a problem. I was wonder if this >>> is a know issue and if there is a >>> solution for it. >>> >>> The problem is easily observed >>> using ex6.c in >>> src/ksp/ksp/examples/tests. >>> >>> Out of the box: make runex6 >>> produces a residual error of >>> O(1e-11), all >>> is well. >>> >>> I then changed the run to run on >>> two processors and add the flag >>> -pc_factor_mat_solver_package >>> spooles this produces a >>> residual error >>> of >>> O(1e-11), all is still well. >>> >>> I then switch over to >>> -pc_factor_mat_solver_package >>> superlu_dist and >>> the >>> residual error comes back as >>> 22.6637! Something seems very >>> wrong. >>> >>> My build is perfectly vanilla: >>> >>> export >>> PETSC_DIR=/Users/sg/petsc-3.3-p5/ >>> export PETSC_ARCH=intel >>> >>> ./configure --with-cc=icc >>> --with-fc=ifort \ >>> >>> -download-{spooles,parmetis,superlu_dist,prometheus,mpich,ml,hypre,metis} >>> >>> make >>> PETSC_DIR=/Users/sg/petsc-3.3-p5/ PETSC_ARCH=intel >>> all >>> make >>> PETSC_DIR=/Users/sg/petsc-3.3-p5/ PETSC_ARCH=intel >>> test >>> >>> -sanjay >>> >>> >>> >>> -- >>> ----------------------------------------------- >>> Sanjay Govindjee, PhD, PE >>> Professor of Civil Engineering >>> Vice Chair for Academic Affairs >>> >>> 779 Davis Hall >>> Structural Engineering, Mechanics >>> and Materials >>> Department of Civil Engineering >>> University of California >>> Berkeley, CA 94720-1710 >>> >>> Voice: +1 510 642 6060 >>> >>> FAX: +1 510 643 5264 >>> >>> s_g at berkeley.edu >>> >>> http://www.ce.berkeley.edu/~sanjay >>> >>> ----------------------------------------------- >>> >>> New Books: >>> >>> Engineering Mechanics of Deformable >>> Solids: A Presentation with Exercises >>> >>> >>> http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641 >>> http://ukcatalogue.oup.com/product/9780199651641.do >>> http://amzn.com/0199651647 >>> >>> >>> Engineering Mechanics 3 (Dynamics) >>> http://www.springer.com/materials/mechanics/book/978-3-642-14018-1 >>> http://amzn.com/3642140181 >>> >>> ----------------------------------------------- >>> >>> >>> >>> >>> -- >>> What most experimenters take for granted >>> before they begin their >>> experiments is infinitely more >>> interesting than any results to which >>> their >>> experiments lead. >>> -- Norbert Wiener >>> >>> >>> >>> -- >>> What most experimenters take for granted >>> before they begin their >>> experiments >>> is infinitely more interesting than any >>> results to which their >>> experiments >>> lead. >>> -- Norbert Wiener >>> >>> >>> -- >>> ----------------------------------------------- >>> Sanjay Govindjee, PhD, PE >>> Professor of Civil Engineering >>> Vice Chair for Academic Affairs >>> >>> 779 Davis Hall >>> Structural Engineering, Mechanics and Materials >>> Department of Civil Engineering >>> University of California >>> Berkeley, CA 94720-1710 >>> >>> Voice: +1 510 642 6060 >>> FAX: +1 510 643 5264 >>> s_g at berkeley.edu >>> http://www.ce.berkeley.edu/~sanjay >>> >>> ----------------------------------------------- >>> >>> New Books: >>> >>> Engineering Mechanics of Deformable >>> Solids: A Presentation with Exercises >>> http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641 >>> http://ukcatalogue.oup.com/product/9780199651641.do >>> http://amzn.com/0199651647 >>> >>> >>> Engineering Mechanics 3 (Dynamics) >>> http://www.springer.com/materials/mechanics/book/978-3-642-14018-1 >>> http://amzn.com/3642140181 >>> >>> ----------------------------------------------- >>> >>> >>> -- >>> ----------------------------------------------- >>> Sanjay Govindjee, PhD, PE >>> Professor of Civil Engineering >>> Vice Chair for Academic Affairs >>> >>> 779 Davis Hall >>> Structural Engineering, Mechanics and Materials >>> Department of Civil Engineering >>> University of California >>> Berkeley, CA 94720-1710 >>> >>> Voice: +1 510 642 6060 >>> FAX: +1 510 643 5264 >>> s_g at berkeley.edu >>> http://www.ce.berkeley.edu/~sanjay >>> >>> ----------------------------------------------- >>> >>> New Books: >>> >>> Engineering Mechanics of Deformable >>> Solids: A Presentation with Exercises >>> http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641 >>> http://ukcatalogue.oup.com/product/9780199651641.do >>> http://amzn.com/0199651647 >>> >>> >>> Engineering Mechanics 3 (Dynamics) >>> http://www.springer.com/materials/mechanics/book/978-3-642-14018-1 >>> http://amzn.com/3642140181 >>> >>> ----------------------------------------------- >>> >>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to >>> which their experiments lead. >>> -- Norbert Wiener >> >> -- >> ----------------------------------------------- >> Sanjay Govindjee, PhD, PE >> Professor of Civil Engineering >> Vice Chair for Academic Affairs >> >> 779 Davis Hall >> Structural Engineering, Mechanics and Materials >> Department of Civil Engineering >> University of California >> Berkeley, CA 94720-1710 >> >> Voice:+1 510 642 6060 >> FAX:+1 510 643 5264 >> s_g at berkeley.edu >> http://www.ce.berkeley.edu/~sanjay >> ----------------------------------------------- >> >> New Books: >> >> Engineering Mechanics of Deformable >> Solids: A Presentation with Exercises >> http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641 >> http://ukcatalogue.oup.com/product/9780199651641.do >> http://amzn.com/0199651647 >> >> >> Engineering Mechanics 3 (Dynamics) >> http://www.springer.com/materials/mechanics/book/978-3-642-14018-1 >> http://amzn.com/3642140181 >> >> ----------------------------------------------- > > -- > ----------------------------------------------- > Sanjay Govindjee, PhD, PE > Professor of Civil Engineering > Vice Chair for Academic Affairs > > 779 Davis Hall > Structural Engineering, Mechanics and Materials > Department of Civil Engineering > University of California > Berkeley, CA 94720-1710 > > Voice:+1 510 642 6060 > FAX:+1 510 643 5264 > s_g at berkeley.edu > http://www.ce.berkeley.edu/~sanjay > ----------------------------------------------- > > New Books: > > Engineering Mechanics of Deformable > Solids: A Presentation with Exercises > http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641 > http://ukcatalogue.oup.com/product/9780199651641.do > http://amzn.com/0199651647 > > > Engineering Mechanics 3 (Dynamics) > http://www.springer.com/materials/mechanics/book/978-3-642-14018-1 > http://amzn.com/3642140181 > > ----------------------------------------------- > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener -- ----------------------------------------------- Sanjay Govindjee, PhD, PE Professor of Civil Engineering Vice Chair for Academic Affairs 779 Davis Hall Structural Engineering, Mechanics and Materials Department of Civil Engineering University of California Berkeley, CA 94720-1710 Voice: +1 510 642 6060 FAX: +1 510 643 5264 s_g at berkeley.edu http://www.ce.berkeley.edu/~sanjay ----------------------------------------------- New Books: Engineering Mechanics of Deformable Solids: A Presentation with Exercises http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641 http://ukcatalogue.oup.com/product/9780199651641.do http://amzn.com/0199651647 Engineering Mechanics 3 (Dynamics) http://www.springer.com/materials/mechanics/book/978-3-642-14018-1 http://amzn.com/3642140181 ----------------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarua at iit.edu Thu Dec 27 00:22:28 2012 From: abarua at iit.edu (amlan barua) Date: Thu, 27 Dec 2012 00:22:28 -0600 Subject: [petsc-users] (no subject) Message-ID: Hi, Is there an analogue of VecScatterCreateToZero for DA vectors? The DMDA object has more than one degrees of freedom. If there isn't any, should I use an IS object to do the scattering? Amlan -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Dec 27 06:36:04 2012 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 27 Dec 2012 07:36:04 -0500 Subject: [petsc-users] (no subject) In-Reply-To: References: Message-ID: On Thu, Dec 27, 2012 at 1:22 AM, amlan barua wrote: > Hi, > Is there an analogue of VecScatterCreateToZero for DA vectors? The DMDA > object has more than one degrees of freedom. > If there isn't any, should I use an IS object to do the scattering? > I do not understand. Why can't you give your DA vector as input? Matt > > Amlan > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Thu Dec 27 08:18:19 2012 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 27 Dec 2012 08:18:19 -0600 Subject: [petsc-users] (no subject) In-Reply-To: References: Message-ID: ierr = DMDACreateNaturalVector(da,&natural);CHKERRQ(ierr); ierr = DMDAGlobalToNaturalBegin(da,xin,INSERT_VALUES,natural);CHKERRQ(ierr); ierr = DMDAGlobalToNaturalEnd(da,xin,INSERT_VALUES,natural);CHKERRQ(ierr); Now do VecScatterCreateToZero() from natural and the vector will be in the natural ordering on process zero with the dof interlaced. Barry On Dec 27, 2012, at 12:22 AM, amlan barua wrote: > Hi, > Is there an analogue of VecScatterCreateToZero for DA vectors? The DMDA object has more than one degrees of freedom. > If there isn't any, should I use an IS object to do the scattering? > Amlan From thomas.witkowski at tu-dresden.de Thu Dec 27 10:10:22 2012 From: thomas.witkowski at tu-dresden.de (Thomas Witkowski) Date: Thu, 27 Dec 2012 17:10:22 +0100 Subject: [petsc-users] LU factorization and solution of independent matrices does not scale, why? In-Reply-To: <20121221220521.qbp4io8kws040o8g@mail.zih.tu-dresden.de> References: <50D37234.2040205@tu-dresden.de> <4F2AF113-B369-42AD-95B9-3D4C1E8F5CEE@mcs.anl.gov> <20121220213950.nyu4ddy1og0kkw8c@mail.zih.tu-dresden.de> <50D42D82.10603@tu-dresden.de> <20121221165112.h5x9cere68sgc488@mail.zih.tu-dresden.de> <20121221220521.qbp4io8kws040o8g@mail.zih.tu-dresden.de> Message-ID: <50DC72EE.20001@tu-dresden.de> Have anyone of you tried to reproduce this problem? Thomas Am 21.12.2012 22:05, schrieb Thomas Witkowski: > So, here it is. Just compile and run with > > mpiexec -np 64 ./ex10 -ksp_type preonly -pc_type lu > -pc_factor_mat_solver_package superlu_dist -log_summary > > 64 cores: 0.09 seconds for solving > 1024 cores: 2.6 seconds for solving > > Thomas > > > Zitat von Jed Brown : > >> Can you reproduce this in a simpler environment so that we can report >> it? >> As I understand your statement, it sounds like you could reproduce by >> changing src/ksp/ksp/examples/tutorials/ex10.c to create a subcomm of >> size >> 4 and the using that everywhere, then compare log_summary running on 4 >> cores to running on more (despite everything really being independent) >> >> It would also be worth using an MPI profiler to see if it's really >> spending >> a lot of time in MPI_Iprobe. Since SuperLU_DIST does not use >> MPI_Iprobe, it >> may be something else. >> >> On Fri, Dec 21, 2012 at 8:51 AM, Thomas Witkowski < >> Thomas.Witkowski at tu-dresden.de> wrote: >> >>> I use a modified MPICH version. On the system I use for these >>> benchmarks I >>> cannot use another MPI library. >>> >>> I'm not fixed to MUMPS. Superlu_dist, for example, works also perfectly >>> for this. But there is still the following problem I cannot solve: >>> When I >>> increase the number of coarse space matrices, there seems to be no >>> scaling >>> direct solver for this. Just to summaries: >>> - one coarse space matrix is created always by one "cluster" >>> consisting of >>> four subdomanins/MPI tasks >>> - the four tasks are always local to one node, thus inter-node network >>> communication is not required for computing factorization and solve >>> - independent of the number of cluster, the coarse space matrices >>> are the >>> same, have the same number of rows, nnz structure but possibly >>> different >>> values >>> - there is NO load unbalancing >>> - the matrices must be factorized and there are a lot of solves (> 100) >>> with them >>> >>> It should be pretty clear, that computing LU factorization and solving >>> with it should scale perfectly. But at the moment, all direct solver I >>> tried (mumps, superlu_dist, pastix) are not able to scale. The loos of >>> scale is really worse, as you can see from the numbers I send before. >>> >>> Any ideas? Suggestions? Without a scaling solver method for these >>> kind of >>> systems, my multilevel FETI-DP code is just more or less a joke, >>> only some >>> orders of magnitude slower than standard FETI-DP method :) >>> >>> Thomas >>> >>> Zitat von Jed Brown : >>> >>> MUMPS uses MPI_Iprobe on MPI_COMM_WORLD (hard-coded). What MPI >>>> implementation have you been using? Is the behavior different with a >>>> different implementation? >>>> >>>> >>>> On Fri, Dec 21, 2012 at 2:36 AM, Thomas Witkowski < >>>> thomas.witkowski at tu-dresden.de**> wrote: >>>> >>>> Okay, I did a similar benchmark now with PETSc's event logging: >>>>> >>>>> UMFPACK >>>>> 16p: Local solve 350 1.0 2.3025e+01 1.1 5.00e+04 1.0 >>>>> 0.0e+00 >>>>> 0.0e+00 7.0e+02 63 0 0 0 52 63 0 0 0 51 0 >>>>> 64p: Local solve 350 1.0 2.3208e+01 1.1 5.00e+04 1.0 >>>>> 0.0e+00 >>>>> 0.0e+00 7.0e+02 60 0 0 0 52 60 0 0 0 51 0 >>>>> 256p: Local solve 350 1.0 2.3373e+01 1.1 5.00e+04 1.0 >>>>> 0.0e+00 >>>>> 0.0e+00 7.0e+02 49 0 0 0 52 49 0 0 0 51 1 >>>>> >>>>> MUMPS >>>>> 16p: Local solve 350 1.0 4.7183e+01 1.1 5.00e+04 1.0 >>>>> 0.0e+00 >>>>> 0.0e+00 7.0e+02 75 0 0 0 52 75 0 0 0 51 0 >>>>> 64p: Local solve 350 1.0 7.1409e+01 1.1 5.00e+04 1.0 >>>>> 0.0e+00 >>>>> 0.0e+00 7.0e+02 78 0 0 0 52 78 0 0 0 51 0 >>>>> 256p: Local solve 350 1.0 2.6079e+02 1.1 5.00e+04 1.0 >>>>> 0.0e+00 >>>>> 0.0e+00 7.0e+02 82 0 0 0 52 82 0 0 0 51 0 >>>>> >>>>> >>>>> As you see, the local solves with UMFPACK have nearly constant >>>>> time with >>>>> increasing number of subdomains. This is what I expect. The I replace >>>>> UMFPACK by MUMPS and I see increasing time for local solves. In >>>>> the last >>>>> columns, UMFPACK has a decreasing value from 63 to 49, while MUMPS's >>>>> column >>>>> increases here from 75 to 82. What does this mean? >>>>> >>>>> Thomas >>>>> >>>>> Am 21.12.2012 02:19, schrieb Matthew Knepley: >>>>> >>>>> On Thu, Dec 20, 2012 at 3:39 PM, Thomas Witkowski >>>>> >>>>>> >>>>> >>>>>> >> >>>>>> >>>>>> wrote: >>>>>> >>>>>> I cannot use the information from log_summary, as I have three >>>>>>> different >>>>>>> LU >>>>>>> factorizations and solve (local matrices and two hierarchies of >>>>>>> coarse >>>>>>> grids). Therefore, I use the following work around to get the >>>>>>> timing of >>>>>>> the >>>>>>> solve I'm intrested in: >>>>>>> >>>>>>> You misunderstand how to use logging. You just put these thing in >>>>>> separate stages. Stages represent >>>>>> parts of the code over which events are aggregated. >>>>>> >>>>>> Matt >>>>>> >>>>>> MPI::COMM_WORLD.Barrier(); >>>>>> >>>>>>> wtime = MPI::Wtime(); >>>>>>> KSPSolve(*(data->ksp_schur_****primal_local), tmp_primal, >>>>>>> >>>>>>> tmp_primal); >>>>>>> FetiTimings::fetiSolve03 += (MPI::Wtime() - wtime); >>>>>>> >>>>>>> The factorization is done explicitly before with "KSPSetUp", so >>>>>>> I can >>>>>>> measure the time for LU factorization. It also does not scale! >>>>>>> For 64 >>>>>>> cores, >>>>>>> I takes 0.05 seconds, for 1024 cores 1.2 seconds. In all >>>>>>> calculations, >>>>>>> the >>>>>>> local coarse space matrices defined on four cores have exactly >>>>>>> the same >>>>>>> number of rows and exactly the same number of non zero entries. So, >>>>>>> from >>>>>>> my >>>>>>> point of view, the time should be absolutely constant. >>>>>>> >>>>>>> Thomas >>>>>>> >>>>>>> Zitat von Barry Smith : >>>>>>> >>>>>>> >>>>>>> Are you timing ONLY the time to factor and solve the >>>>>>> subproblems? >>>>>>> Or >>>>>>> >>>>>>>> also the time to get the data to the collection of 4 cores at >>>>>>>> a time? >>>>>>>> >>>>>>>> If you are only using LU for these problems and not >>>>>>>> elsewhere in >>>>>>>> the >>>>>>>> code you can get the factorization and time from MatLUFactor() >>>>>>>> and >>>>>>>> MatSolve() or you can use stages to put this calculation in >>>>>>>> its own >>>>>>>> stage >>>>>>>> and use the MatLUFactor() and MatSolve() time from that stage. >>>>>>>> Also look at the load balancing column for the factorization and >>>>>>>> solve >>>>>>>> stage, it is well balanced? >>>>>>>> >>>>>>>> Barry >>>>>>>> >>>>>>>> On Dec 20, 2012, at 2:16 PM, Thomas Witkowski >>>>>>>> >>>>>>> >>>>>>>> >> >>>>>>>> >>>>>>>> wrote: >>>>>>>> >>>>>>>> In my multilevel FETI-DP code, I have localized course matrices, >>>>>>>> which >>>>>>>> >>>>>>>>> are defined on only a subset of all MPI tasks, typically >>>>>>>>> between 4 >>>>>>>>> and 64 >>>>>>>>> tasks. The MatAIJ and the KSP objects are both defined on a MPI >>>>>>>>> communicator, which is a subset of MPI::COMM_WORLD. The LU >>>>>>>>> factorization of >>>>>>>>> the matrices is computed with either MUMPS or superlu_dist, >>>>>>>>> but both >>>>>>>>> show >>>>>>>>> some scaling property I really wonder of: When the overall >>>>>>>>> problem >>>>>>>>> size is >>>>>>>>> increased, the solve with the LU factorization of the local >>>>>>>>> matrices >>>>>>>>> does >>>>>>>>> not scale! But why not? I just increase the number of local >>>>>>>>> matrices, >>>>>>>>> but >>>>>>>>> all of them are independent of each other. Some example: I use 64 >>>>>>>>> cores, >>>>>>>>> each coarse matrix is spanned by 4 cores so there are 16 MPI >>>>>>>>> communicators >>>>>>>>> with 16 coarse space matrices. The problem need to solve 192 >>>>>>>>> times >>>>>>>>> with the >>>>>>>>> coarse space systems, and this takes together 0.09 seconds. >>>>>>>>> Now I >>>>>>>>> increase >>>>>>>>> the number of cores to 256, but let the local coarse space be >>>>>>>>> defined >>>>>>>>> again >>>>>>>>> on only 4 cores. Again, 192 solutions with these coarse >>>>>>>>> spaces are >>>>>>>>> required, but now this takes 0.24 seconds. The same for 1024 >>>>>>>>> cores, >>>>>>>>> and we >>>>>>>>> are at 1.7 seconds for the local coarse space solver! >>>>>>>>> >>>>>>>>> For me, this is a total mystery! Any idea how to explain, >>>>>>>>> debug and >>>>>>>>> eventually how to resolve this problem? >>>>>>>>> >>>>>>>>> Thomas >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> -- >>>>>> What most experimenters take for granted before they begin their >>>>>> experiments is infinitely more interesting than any results to which >>>>>> their experiments lead. >>>>>> -- Norbert Wiener >>>>>> >>>>>> >>>>> >>>>> >>>> >>> >>> >> > > From bsmith at mcs.anl.gov Thu Dec 27 10:40:44 2012 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 27 Dec 2012 10:40:44 -0600 Subject: [petsc-users] (no subject) In-Reply-To: References: Message-ID: <118BE79A-7DD9-4D02-8EE7-650010BAF1D2@mcs.anl.gov> On Dec 27, 2012, at 10:34 AM, amlan barua wrote: > I think I can use VecSetValues, is that right? Yes you could do that. But since you are using a DMDA you could also use DMGetLocalVector(), DMGlobalToLocalBegin/End() followed by DMDAVecGetArray() to access the ghost values. Barry > Amlan > > > On Thu, Dec 27, 2012 at 9:04 AM, amlan barua wrote: > Hi Barry, > Is this scattering a very costly operation? I have to compute x[i] = f(x[i-1]) where f is known. Since this operation is strictly sequential, I thought of gathering the entire vector on processor 0, do the sequential operation there and scatter the result back. However this is unnecessary because I only need the bordering x[i] values. What can be a better way? > Amlan > > > On Thu, Dec 27, 2012 at 8:18 AM, Barry Smith wrote: > > ierr = DMDACreateNaturalVector(da,&natural);CHKERRQ(ierr); > ierr = DMDAGlobalToNaturalBegin(da,xin,INSERT_VALUES,natural);CHKERRQ(ierr); > ierr = DMDAGlobalToNaturalEnd(da,xin,INSERT_VALUES,natural);CHKERRQ(ierr); > > Now do VecScatterCreateToZero() from natural and the vector will be in the natural ordering on process zero with the dof interlaced. > > > Barry > > On Dec 27, 2012, at 12:22 AM, amlan barua wrote: > > > Hi, > > Is there an analogue of VecScatterCreateToZero for DA vectors? The DMDA object has more than one degrees of freedom. > > If there isn't any, should I use an IS object to do the scattering? > > Amlan > > > From jefonseca at gmail.com Fri Dec 28 10:58:02 2012 From: jefonseca at gmail.com (Jim Fonseca) Date: Fri, 28 Dec 2012 11:58:02 -0500 Subject: [petsc-users] how to determine if complex matrix has imaginary components Message-ID: Hi, Is there a computationally fast way to determine if the imaginary components of a matrix are zero or very small? Thanks, Jim -- Jim Fonseca, PhD Research Scientist Network for Computational Nanotechnology Purdue University 765-496-6495 www.jimfonseca.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Dec 28 11:37:48 2012 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 28 Dec 2012 12:37:48 -0500 Subject: [petsc-users] how to determine if complex matrix has imaginary components In-Reply-To: References: Message-ID: On Fri, Dec 28, 2012 at 11:58 AM, Jim Fonseca wrote: > Hi, > Is there a computationally fast way to determine if the imaginary > components of a matrix are zero or very small? > I can't think of anything faster than checking each entry. Matt > Thanks, > Jim > > > -- > Jim Fonseca, PhD > Research Scientist > Network for Computational Nanotechnology > Purdue University > 765-496-6495 > www.jimfonseca.com > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From slivkaje at gmail.com Fri Dec 28 16:01:03 2012 From: slivkaje at gmail.com (Jelena Slivka) Date: Fri, 28 Dec 2012 23:01:03 +0100 Subject: [petsc-users] (no subject) Message-ID: Hello! I have a few simple questions about PETSc about functions that I can't seem to find in the documentation: 1) Is there a way to automatically create a matrix in which all elements are the same scalar value a, e.g. something like ones(m,n) in Matlab? 2) Is there an equivalent to Matlab .* operator? 3) Is there a function that can create matrix C by appending matrices A and B? Grateful in advance -------------- next part -------------- An HTML attachment was scrubbed... URL: From jedbrown at mcs.anl.gov Fri Dec 28 17:02:47 2012 From: jedbrown at mcs.anl.gov (Jed Brown) Date: Fri, 28 Dec 2012 17:02:47 -0600 Subject: [petsc-users] (no subject) In-Reply-To: References: Message-ID: On Fri, Dec 28, 2012 at 4:01 PM, Jelena Slivka wrote: > Hello! > I have a few simple questions about PETSc about functions that I can't > seem to find in the documentation: > 1) Is there a way to automatically create a matrix in which all elements > are the same scalar value a, e.g. something like ones(m,n) in Matlab? > That matrix (or any very low rank matrix) should not be stored explicitly as a dense matrix. 2) Is there an equivalent to Matlab .* operator? > There is not a MatPointwiseMult(). It could be added, but I'm not aware of a use for this operator outside of matrix misuse (using a Mat to represent an array of numbers that are not an operator, thus should really be a Vec, perhaps managed using DMDA). > 3) Is there a function that can create matrix C by appending matrices A > and B? > Grateful in advance > Block matrices can be manipulated efficiently using MATNEST, but there is a very high probability of misuse unless you really understand why that is an appropriate data structure. Much more likely, you should create a matrix of size C, then assemble the parts of A and B into it, perhaps using MatGetLocalSubMatrix() so that the assembly "looks" like assembling A and B separately. Note that in parallel, you almost never want "concatenation" in the matrix sense of [A B; C D]. Instead, you want that there is some row and column permutation in which the operation would be concatenation, but in reality, the matrices are actually interleaved with some granularity so that both are well-distributed on the parallel machine. -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarua at iit.edu Sat Dec 29 01:40:26 2012 From: abarua at iit.edu (amlan barua) Date: Sat, 29 Dec 2012 01:40:26 -0600 Subject: [petsc-users] (no subject) In-Reply-To: <118BE79A-7DD9-4D02-8EE7-650010BAF1D2@mcs.anl.gov> References: <118BE79A-7DD9-4D02-8EE7-650010BAF1D2@mcs.anl.gov> Message-ID: Hi Barry, I wrote the following piece according to your suggestions. Currently it does nothing but creates a vector with 1 at 1th position, 2 at 2th and so on. But I made it serial, i.e. (n+1)th place is computed using the value of nth place. My question, did I do it correctly, i.e. is it safe or results may change depending on problem size? This is much faster than VecSetValues, I believe the communication is minimum here because I take the advantage of ghost points. Amlan PetscInitialize(&argc,&argv,(char *)0,help); ierr = MPI_Comm_size(PETSC_COMM_WORLD, &size); CHKERRQ(ierr); ierr = MPI_Comm_rank(PETSC_COMM_WORLD, &rank); CHKERRQ(ierr); ierr = DMDACreate1d(PETSC_COMM_WORLD,DMDA_BOUNDARY_NONE,size*5,1,1,PETSC_NULL,&da); CHKERRQ(ierr); ierr = DMCreateGlobalVector(da,&vec); CHKERRQ(ierr); ierr = VecSet(vec,1.00); ierr = DMCreateLocalVector(da,&local); ierr = DMDAGetLocalInfo(da,&info); ierr = DMDAVecGetArray(da,vec,&arr); ierr = DMDAVecGetArray(da,local,&array); temp = 1; for (j=0;j wrote: > > On Dec 27, 2012, at 10:34 AM, amlan barua wrote: > > > I think I can use VecSetValues, is that right? > > Yes you could do that. But since you are using a DMDA you could also > use DMGetLocalVector(), DMGlobalToLocalBegin/End() followed by > DMDAVecGetArray() to access the ghost values. > > Barry > > > Amlan > > > > > > On Thu, Dec 27, 2012 at 9:04 AM, amlan barua wrote: > > Hi Barry, > > Is this scattering a very costly operation? I have to compute x[i] = > f(x[i-1]) where f is known. Since this operation is strictly sequential, I > thought of gathering the entire vector on processor 0, do the sequential > operation there and scatter the result back. However this is unnecessary > because I only need the bordering x[i] values. What can be a better way? > > Amlan > > > > > > On Thu, Dec 27, 2012 at 8:18 AM, Barry Smith wrote: > > > > ierr = DMDACreateNaturalVector(da,&natural);CHKERRQ(ierr); > > ierr = > DMDAGlobalToNaturalBegin(da,xin,INSERT_VALUES,natural);CHKERRQ(ierr); > > ierr = > DMDAGlobalToNaturalEnd(da,xin,INSERT_VALUES,natural);CHKERRQ(ierr); > > > > Now do VecScatterCreateToZero() from natural and the vector will be in > the natural ordering on process zero with the dof interlaced. > > > > > > Barry > > > > On Dec 27, 2012, at 12:22 AM, amlan barua wrote: > > > > > Hi, > > > Is there an analogue of VecScatterCreateToZero for DA vectors? The > DMDA object has more than one degrees of freedom. > > > If there isn't any, should I use an IS object to do the scattering? > > > Amlan > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jedbrown at mcs.anl.gov Sat Dec 29 10:19:50 2012 From: jedbrown at mcs.anl.gov (Jed Brown) Date: Sat, 29 Dec 2012 10:19:50 -0600 Subject: [petsc-users] (no subject) In-Reply-To: References: <118BE79A-7DD9-4D02-8EE7-650010BAF1D2@mcs.anl.gov> Message-ID: On Sat, Dec 29, 2012 at 1:40 AM, amlan barua wrote: > Hi Barry, > I wrote the following piece according to your suggestions. Currently it > does nothing but creates a vector with 1 at 1th position, 2 at 2th and so > on. But I made it serial, i.e. (n+1)th place is computed using the value of > nth place. My question, did I do it correctly, i.e. is it safe or results > may change depending on problem size? This is much faster than > VecSetValues, I believe the communication is minimum here because I take > the advantage of ghost points. > Amlan > > PetscInitialize(&argc,&argv,(char *)0,help); > ierr = MPI_Comm_size(PETSC_COMM_WORLD, &size); CHKERRQ(ierr); > ierr = MPI_Comm_rank(PETSC_COMM_WORLD, &rank); CHKERRQ(ierr); > ierr = > DMDACreate1d(PETSC_COMM_WORLD,DMDA_BOUNDARY_NONE,size*5,1,1,PETSC_NULL,&da); > CHKERRQ(ierr); > ierr = DMCreateGlobalVector(da,&vec); CHKERRQ(ierr); > ierr = VecSet(vec,1.00); > ierr = DMCreateLocalVector(da,&local); > ierr = DMDAGetLocalInfo(da,&info); > ierr = DMDAVecGetArray(da,vec,&arr); > ierr = DMDAVecGetArray(da,local,&array); > temp = 1; > for (j=0;j This is needlessly sequential (or should be). > ierr = DMGlobalToLocalBegin(da,vec,INSERT_VALUES,local); > CHKERRQ(ierr); > ierr = DMGlobalToLocalEnd(da,vec,INSERT_VALUES,local); > CHKERRQ(ierr); > You should never use a communication routine while you have access to the array (*VecGetArray()). > if (rank==j) { > for (i=info.xs;i if ((!i)==0) { > array[i] = array[i] + array[i-1]; > What sort of recurrence do you actually want to implement. When possible, it's much better to reorganize so that you can do local work followed by an MPI_Scan followed by more local work. MPI_Scan is fast (logarithmic). > arr[i] = array[i]; > } > } > } > } > ierr = DMDAVecRestoreArray(da,local,&array); > ierr = DMDAVecRestoreArray(da,vec,&arr); > ierr = VecView(vec,PETSC_VIEWER_STDOUT_WORLD); > PetscFinalize(); > return 0; > > > > On Thu, Dec 27, 2012 at 10:40 AM, Barry Smith wrote: > >> >> On Dec 27, 2012, at 10:34 AM, amlan barua wrote: >> >> > I think I can use VecSetValues, is that right? >> >> Yes you could do that. But since you are using a DMDA you could also >> use DMGetLocalVector(), DMGlobalToLocalBegin/End() followed by >> DMDAVecGetArray() to access the ghost values. >> >> Barry >> >> > Amlan >> > >> > >> > On Thu, Dec 27, 2012 at 9:04 AM, amlan barua wrote: >> > Hi Barry, >> > Is this scattering a very costly operation? I have to compute x[i] = >> f(x[i-1]) where f is known. Since this operation is strictly sequential, I >> thought of gathering the entire vector on processor 0, do the sequential >> operation there and scatter the result back. However this is unnecessary >> because I only need the bordering x[i] values. What can be a better way? >> > Amlan >> > >> > >> > On Thu, Dec 27, 2012 at 8:18 AM, Barry Smith >> wrote: >> > >> > ierr = DMDACreateNaturalVector(da,&natural);CHKERRQ(ierr); >> > ierr = >> DMDAGlobalToNaturalBegin(da,xin,INSERT_VALUES,natural);CHKERRQ(ierr); >> > ierr = >> DMDAGlobalToNaturalEnd(da,xin,INSERT_VALUES,natural);CHKERRQ(ierr); >> > >> > Now do VecScatterCreateToZero() from natural and the vector will be in >> the natural ordering on process zero with the dof interlaced. >> > >> > >> > Barry >> > >> > On Dec 27, 2012, at 12:22 AM, amlan barua wrote: >> > >> > > Hi, >> > > Is there an analogue of VecScatterCreateToZero for DA vectors? The >> DMDA object has more than one degrees of freedom. >> > > If there isn't any, should I use an IS object to do the scattering? >> > > Amlan >> > >> > >> > >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jedbrown at mcs.anl.gov Sat Dec 29 12:59:54 2012 From: jedbrown at mcs.anl.gov (Jed Brown) Date: Sat, 29 Dec 2012 12:59:54 -0600 Subject: [petsc-users] Direct Schur complement domain decomposition In-Reply-To: <002b01cdde05$6a2d0330$3e870990$@tuhh.de> References: <002b01cdde05$6a2d0330$3e870990$@tuhh.de> Message-ID: Sorry for the slow reply. What you are describing _is_ multifrontal factorization, or alternatively, (non-iterative) substructuring. It is a direct solve and boils down to a few large dense direct solves. Incomplete factorization is one way of preventing the Schur complements from getting too dense, but it's not very reliable. There are many other ways of retaining structure in the supernodes (i.e., avoid unstructured dense matrices), at the expense of some error. These methods "compress" the Schur complement using low-rank representations for long-range interaction. These are typically combined with an iterative method. Multigrid and multilevel DD methods can be thought of as an alternate way to compress (approximately) the long-range interaction coming from inexact elimination (dimensional reduction of interfaces). On Wed, Dec 19, 2012 at 10:25 AM, Stefan Kurzbach wrote: > Hello everybody,**** > > ** ** > > in my recent research on parallelization of a 2D unstructured flow model > code I came upon a question on domain decomposition techniques in ?grids?. > Maybe someone knows of any previous results on this?**** > > ** ** > > Typically, when doing large simulations with many unknowns, the problem is > distributed to many computer nodes and solved in parallel by some iterative > method. Many of these iterative methods boil down to a large number of > distributed matrix-vector multiplications (in the order of the number of > iterations). This means there are many synchronization points in the > algorithms, which makes them tightly coupled. This has been found to work > well on clusters with fast networks.**** > > ** ** > > Now my question:**** > > What if there is a small number of very powerful nodes (say less than 10), > which are connected by a slow network, e.g. several computer clusters > connected over the internet (some people call this ?grid computing?). I > expect that the traditional iterative methods will not be as efficient here > (any references?).**** > > ** ** > > My guess is that a solution method with fewer synchronization points will > work better, even though that method may be computationally more expensive > than traditional methods. An example would be a domain composition approach > with direct solution of the Schur complement on the interface. This > requires that the interface size has to be small compared to the subdomain > size. As this algorithm basically works in three decoupled phases (solve > the subdomains for several right hand sides, assemble and solve the Schur > complement system, correct the subdomain results) it should be suited well, > but I have no idea how to test or otherwise prove it. Has anybody made any > thoughts on this before, possibly dating back to the 80ies and 90ies, where > slow networks were more common?**** > > ** ** > > Best regards**** > > Stefan**** > > ** ** > > ** ** > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Sat Dec 29 15:21:00 2012 From: bsmith at mcs.anl.gov (Barry Smith) Date: Sat, 29 Dec 2012 15:21:00 -0600 Subject: [petsc-users] Direct Schur complement domain decomposition In-Reply-To: References: <002b01cdde05$6a2d0330$3e870990$@tuhh.de> Message-ID: <9FFBA092-74B0-4CF3-AF65-45A001FDAC2E@mcs.anl.gov> My off the cuff response is that "computing the exact Schur complements for the subdomains is sooooo expensive that it swamps out any savings in reducing the amount of communication" plus it requires soooo much memory. Thus solvers like these may make sense only when the problem is "non-standard" enough that iterative methods simply don't work (perhaps due to extreme ill-conditioning), such problems do exist but for most "PDE" problems with enough time and effort one can cook up the right combination of "block-splittings" and multilevel (multigrid) methods to get a much more efficient solver that gives you the accuracy you need long before the Schur complements have been computed. Barry On Dec 29, 2012, at 12:59 PM, Jed Brown wrote: > Sorry for the slow reply. What you are describing _is_ multifrontal factorization, or alternatively, (non-iterative) substructuring. It is a direct solve and boils down to a few large dense direct solves. Incomplete factorization is one way of preventing the Schur complements from getting too dense, but it's not very reliable. > > There are many other ways of retaining structure in the supernodes (i.e., avoid unstructured dense matrices), at the expense of some error. These methods "compress" the Schur complement using low-rank representations for long-range interaction. These are typically combined with an iterative method. > > Multigrid and multilevel DD methods can be thought of as an alternate way to compress (approximately) the long-range interaction coming from inexact elimination (dimensional reduction of interfaces). > > On Wed, Dec 19, 2012 at 10:25 AM, Stefan Kurzbach wrote: > Hello everybody, > > > > in my recent research on parallelization of a 2D unstructured flow model code I came upon a question on domain decomposition techniques in ?grids?. Maybe someone knows of any previous results on this? > > > > Typically, when doing large simulations with many unknowns, the problem is distributed to many computer nodes and solved in parallel by some iterative method. Many of these iterative methods boil down to a large number of distributed matrix-vector multiplications (in the order of the number of iterations). This means there are many synchronization points in the algorithms, which makes them tightly coupled. This has been found to work well on clusters with fast networks. > > > > Now my question: > > What if there is a small number of very powerful nodes (say less than 10), which are connected by a slow network, e.g. several computer clusters connected over the internet (some people call this ?grid computing?). I expect that the traditional iterative methods will not be as efficient here (any references?). > > > > My guess is that a solution method with fewer synchronization points will work better, even though that method may be computationally more expensive than traditional methods. An example would be a domain composition approach with direct solution of the Schur complement on the interface. This requires that the interface size has to be small compared to the subdomain size. As this algorithm basically works in three decoupled phases (solve the subdomains for several right hand sides, assemble and solve the Schur complement system, correct the subdomain results) it should be suited well, but I have no idea how to test or otherwise prove it. Has anybody made any thoughts on this before, possibly dating back to the 80ies and 90ies, where slow networks were more common? > > > > Best regards > > Stefan > > > > > > From slivkaje at gmail.com Sat Dec 29 21:41:47 2012 From: slivkaje at gmail.com (Jelena Slivka) Date: Sun, 30 Dec 2012 04:41:47 +0100 Subject: [petsc-users] MatAXPY Segmentation violation Message-ID: Hello, I am experiencing the strange behavior when calling the MatAXPY function. Here is my code: matrix similarity is a square matrix (n=m) I create the matrix aux that has all zero elements, except for the diagonal. The elements of the diagonal in matrix aux are sums of rows in matrix similarity. MatSetFromOptions(similarity); int n, m; MatGetSize(similarity, &n, &m); Vec tmp; VecCreate(PETSC_COMM_WORLD, &tmp); VecSetSizes(tmp, PETSC_DECIDE, n); VecSetFromOptions(tmp); MatGetRowSum(similarity, tmp); Mat aux; MatCreate(PETSC_COMM_WORLD, &aux); MatSetSizes(aux, PETSC_DECIDE, PETSC_DECIDE, n, m); MatSetFromOptions(aux); MatSetUp(aux); MatZeroEntries(aux); MatDiagonalSet(aux, tmp, INSERT_VALUES); VecDestroy(&tmp); MatAXPY(aux, -1, similarity, DIFFERENT_NONZERO_PATTERN); If I execute this code using only one process I get the segmentation violation error: [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger [0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors [0]PETSC ERROR: likely location of problem given in stack below [0]PETSC ERROR: --------------------- Stack Frames ------------------------------------ [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, [0]PETSC ERROR: INSTEAD the line number of the start of the function [0]PETSC ERROR: is given. [0]PETSC ERROR: [0] MatAXPYGetPreallocation_SeqAIJ line 2562 src/mat/impls/aij/seq/aij.c [0]PETSC ERROR: [0] MatAXPY_SeqAIJ line 2587 src/mat/impls/aij/seq/aij.c [0]PETSC ERROR: [0] MatAXPY line 29 src/mat/utils/axpy.c [0]PETSC ERROR: --------------------- Error Message ------------------------------------ [0]PETSC ERROR: Signal received! However, if I run the same code using two processes it runs ok and gives the good result. Could you please tell me what am I doing wrong? Grateful in advance -------------- next part -------------- An HTML attachment was scrubbed... URL: From jedbrown at mcs.anl.gov Sat Dec 29 21:52:33 2012 From: jedbrown at mcs.anl.gov (Jed Brown) Date: Sat, 29 Dec 2012 21:52:33 -0600 Subject: [petsc-users] MatAXPY Segmentation violation In-Reply-To: References: Message-ID: You might rather use MatDuplicate(similarity,MAT_DO_NOT_COPY_VALUES,&aux). Can you try these? 1. using the debugger to get a stack trace 2. run in valgrind to check for memory errors 3. set up a test case so we can reproduce On Sat, Dec 29, 2012 at 9:41 PM, Jelena Slivka wrote: > Hello, > > I am experiencing the strange behavior when calling the MatAXPY function. > Here is my code: > matrix similarity is a square matrix (n=m) > I create the matrix aux that has all zero elements, except for the > diagonal. The elements of the diagonal in matrix aux are sums of rows in > matrix similarity. > > MatSetFromOptions(similarity); > int n, m; > MatGetSize(similarity, &n, &m); > > Vec tmp; > VecCreate(PETSC_COMM_WORLD, &tmp); > VecSetSizes(tmp, PETSC_DECIDE, n); > VecSetFromOptions(tmp); > MatGetRowSum(similarity, tmp); > > Mat aux; > MatCreate(PETSC_COMM_WORLD, &aux); > MatSetSizes(aux, PETSC_DECIDE, PETSC_DECIDE, n, m); > MatSetFromOptions(aux); > MatSetUp(aux); > MatZeroEntries(aux); > MatDiagonalSet(aux, tmp, INSERT_VALUES); > VecDestroy(&tmp); > > MatAXPY(aux, -1, similarity, DIFFERENT_NONZERO_PATTERN); > > If I execute this code using only one process I get the segmentation > violation error: > > [0]PETSC ERROR: > ------------------------------------------------------------------------ > [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, > probably memory access out of range > [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > [0]PETSC ERROR: or see > http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[0]PETSCERROR: or try > http://valgrind.org on GNU/linux and Apple Mac OS X to find memory > corruption errors > [0]PETSC ERROR: likely location of problem given in stack below > [0]PETSC ERROR: --------------------- Stack Frames > ------------------------------------ > [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not > available, > [0]PETSC ERROR: INSTEAD the line number of the start of the function > [0]PETSC ERROR: is given. > [0]PETSC ERROR: [0] MatAXPYGetPreallocation_SeqAIJ line 2562 > src/mat/impls/aij/seq/aij.c > [0]PETSC ERROR: [0] MatAXPY_SeqAIJ line 2587 src/mat/impls/aij/seq/aij.c > [0]PETSC ERROR: [0] MatAXPY line 29 src/mat/utils/axpy.c > [0]PETSC ERROR: --------------------- Error Message > ------------------------------------ > [0]PETSC ERROR: Signal received! > > However, if I run the same code using two processes it runs ok and gives > the good result. > Could you please tell me what am I doing wrong? > Grateful in advance > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From slivkaje at gmail.com Sat Dec 29 22:04:09 2012 From: slivkaje at gmail.com (Jelena Slivka) Date: Sun, 30 Dec 2012 05:04:09 +0100 Subject: [petsc-users] MatAXPY Segmentation violation In-Reply-To: References: Message-ID: Thank you very much! Using MatDuplicate solved the problem. On Sun, Dec 30, 2012 at 4:52 AM, Jed Brown wrote: > You might rather use MatDuplicate(similarity,MAT_DO_NOT_COPY_VALUES,&aux). > > Can you try these? > > 1. using the debugger to get a stack trace > 2. run in valgrind to check for memory errors > 3. set up a test case so we can reproduce > > > > On Sat, Dec 29, 2012 at 9:41 PM, Jelena Slivka wrote: > >> Hello, >> >> I am experiencing the strange behavior when calling the MatAXPY function. >> Here is my code: >> matrix similarity is a square matrix (n=m) >> I create the matrix aux that has all zero elements, except for the >> diagonal. The elements of the diagonal in matrix aux are sums of rows in >> matrix similarity. >> >> MatSetFromOptions(similarity); >> int n, m; >> MatGetSize(similarity, &n, &m); >> >> Vec tmp; >> VecCreate(PETSC_COMM_WORLD, &tmp); >> VecSetSizes(tmp, PETSC_DECIDE, n); >> VecSetFromOptions(tmp); >> MatGetRowSum(similarity, tmp); >> >> Mat aux; >> MatCreate(PETSC_COMM_WORLD, &aux); >> MatSetSizes(aux, PETSC_DECIDE, PETSC_DECIDE, n, m); >> MatSetFromOptions(aux); >> MatSetUp(aux); >> MatZeroEntries(aux); >> MatDiagonalSet(aux, tmp, INSERT_VALUES); >> VecDestroy(&tmp); >> >> MatAXPY(aux, -1, similarity, DIFFERENT_NONZERO_PATTERN); >> >> If I execute this code using only one process I get the segmentation >> violation error: >> >> [0]PETSC ERROR: >> ------------------------------------------------------------------------ >> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, >> probably memory access out of range >> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger >> [0]PETSC ERROR: or see >> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[0]PETSCERROR: or try >> http://valgrind.org on GNU/linux and Apple Mac OS X to find memory >> corruption errors >> [0]PETSC ERROR: likely location of problem given in stack below >> [0]PETSC ERROR: --------------------- Stack Frames >> ------------------------------------ >> [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not >> available, >> [0]PETSC ERROR: INSTEAD the line number of the start of the function >> [0]PETSC ERROR: is given. >> [0]PETSC ERROR: [0] MatAXPYGetPreallocation_SeqAIJ line 2562 >> src/mat/impls/aij/seq/aij.c >> [0]PETSC ERROR: [0] MatAXPY_SeqAIJ line 2587 src/mat/impls/aij/seq/aij.c >> [0]PETSC ERROR: [0] MatAXPY line 29 src/mat/utils/axpy.c >> [0]PETSC ERROR: --------------------- Error Message >> ------------------------------------ >> [0]PETSC ERROR: Signal received! >> >> However, if I run the same code using two processes it runs ok and gives >> the good result. >> Could you please tell me what am I doing wrong? >> Grateful in advance >> >> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jedbrown at mcs.anl.gov Sat Dec 29 23:32:04 2012 From: jedbrown at mcs.anl.gov (Jed Brown) Date: Sat, 29 Dec 2012 23:32:04 -0600 Subject: [petsc-users] (no subject) In-Reply-To: References: <118BE79A-7DD9-4D02-8EE7-650010BAF1D2@mcs.anl.gov> Message-ID: On Sat, Dec 29, 2012 at 11:15 PM, amlan barua wrote: > Hi, > I am actually trying to implement a 'parallel' ordinary differential > equation solver. > For proper functioning of the algorithm (the name is parareal), I need to > implement a simple recurrence relation of the form x[i+1] = f(x[i]), f > known, depends on quadrature one would like to use. > What is the best way to implement a sequential operation on a parallel > structure? > Somehow I need to keep all but one process idle which. > It's not very parallel when all but one process is idle. ;-D > So I wrote a loop over all processes and within the loop I forced only one > process to update its part. Say when j=0 ideally only the processor 0 > should work. But others will update their local j value before 0th > processor much faster. Thus I am concerned about the safety of the > operation. Will it be okay if I modify my code as following, please advice? > > Yes, this works, but the performance won't be great when using DMGlobalToLocal despite only really wanting to send the update to the next process in the sequence. (Maybe you don't care about performance yet, or this particular part won't be performance-sensitive.) > PetscInitialize(&argc,&argv,(char *)0,help); > ierr = MPI_Comm_size(PETSC_COMM_WORLD, &size); CHKERRQ(ierr); > ierr = MPI_Comm_rank(PETSC_COMM_WORLD, &rank); CHKERRQ(ierr); > ierr = > DMDACreate1d(PETSC_COMM_WORLD,DMDA_BOUNDARY_NONE,size*5,1,1,PETSC_NULL,&da); > CHKERRQ(ierr); > ierr = DMCreateGlobalVector(da,&vec); CHKERRQ(ierr); > ierr = VecSet(vec,1.00); > ierr = DMCreateLocalVector(da,&local); > ierr = DMDAGetLocalInfo(da,&info); > temp = 1; > for (j=0;j ierr = DMGlobalToLocalBegin(da,vec,INSERT_VALUES,local); > CHKERRQ(ierr); > ierr = DMGlobalToLocalEnd(da,vec,INSERT_VALUES,local); > CHKERRQ(ierr); > ierr = DMDAVecGetArray(da,vec,&arr); > ierr = DMDAVecGetArray(da,local,&array); > if (rank==j) { > for (i=info.xs;i if ((!i)==0) { > I would write if (i) unless I was intentionally trying to confuse the reader. > array[i] = array[i] + array[i-1]; > arr[i] = array[i]; > } > } > } > ierr = DMDAVecRestoreArray(da,local,&array); > ierr = DMDAVecRestoreArray(da,vec,&arr); > } > ierr = VecView(vec,PETSC_VIEWER_STDOUT_WORLD); > PetscFinalize(); > return 0; > Amlan > > > On Sat, Dec 29, 2012 at 10:19 AM, Jed Brown wrote: > >> On Sat, Dec 29, 2012 at 1:40 AM, amlan barua wrote: >> >>> Hi Barry, >>> I wrote the following piece according to your suggestions. Currently it >>> does nothing but creates a vector with 1 at 1th position, 2 at 2th and so >>> on. But I made it serial, i.e. (n+1)th place is computed using the value of >>> nth place. My question, did I do it correctly, i.e. is it safe or results >>> may change depending on problem size? This is much faster than >>> VecSetValues, I believe the communication is minimum here because I take >>> the advantage of ghost points. >>> Amlan >>> >>> PetscInitialize(&argc,&argv,(char *)0,help); >>> ierr = MPI_Comm_size(PETSC_COMM_WORLD, &size); CHKERRQ(ierr); >>> ierr = MPI_Comm_rank(PETSC_COMM_WORLD, &rank); CHKERRQ(ierr); >>> ierr = >>> DMDACreate1d(PETSC_COMM_WORLD,DMDA_BOUNDARY_NONE,size*5,1,1,PETSC_NULL,&da); >>> CHKERRQ(ierr); >>> ierr = DMCreateGlobalVector(da,&vec); CHKERRQ(ierr); >>> ierr = VecSet(vec,1.00); >>> ierr = DMCreateLocalVector(da,&local); >>> ierr = DMDAGetLocalInfo(da,&info); >>> ierr = DMDAVecGetArray(da,vec,&arr); >>> ierr = DMDAVecGetArray(da,local,&array); >>> temp = 1; >>> for (j=0;j>> >> >> This is needlessly sequential (or should be). >> >> >>> ierr = DMGlobalToLocalBegin(da,vec,INSERT_VALUES,local); >>> CHKERRQ(ierr); >>> ierr = DMGlobalToLocalEnd(da,vec,INSERT_VALUES,local); >>> CHKERRQ(ierr); >>> >> >> You should never use a communication routine while you have access to the >> array (*VecGetArray()). >> >> >>> if (rank==j) { >>> for (i=info.xs;i>> if ((!i)==0) { >>> array[i] = array[i] + array[i-1]; >>> >> >> What sort of recurrence do you actually want to implement. When possible, >> it's much better to reorganize so that you can do local work followed by an >> MPI_Scan followed by more local work. MPI_Scan is fast (logarithmic). >> >> >>> arr[i] = array[i]; >>> } >>> } >>> } >>> } >>> ierr = DMDAVecRestoreArray(da,local,&array); >>> ierr = DMDAVecRestoreArray(da,vec,&arr); >>> ierr = VecView(vec,PETSC_VIEWER_STDOUT_WORLD); >>> PetscFinalize(); >>> return 0; >>> >>> >>> >>> On Thu, Dec 27, 2012 at 10:40 AM, Barry Smith wrote: >>> >>>> >>>> On Dec 27, 2012, at 10:34 AM, amlan barua wrote: >>>> >>>> > I think I can use VecSetValues, is that right? >>>> >>>> Yes you could do that. But since you are using a DMDA you could also >>>> use DMGetLocalVector(), DMGlobalToLocalBegin/End() followed by >>>> DMDAVecGetArray() to access the ghost values. >>>> >>>> Barry >>>> >>>> > Amlan >>>> > >>>> > >>>> > On Thu, Dec 27, 2012 at 9:04 AM, amlan barua wrote: >>>> > Hi Barry, >>>> > Is this scattering a very costly operation? I have to compute x[i] = >>>> f(x[i-1]) where f is known. Since this operation is strictly sequential, I >>>> thought of gathering the entire vector on processor 0, do the sequential >>>> operation there and scatter the result back. However this is unnecessary >>>> because I only need the bordering x[i] values. What can be a better way? >>>> > Amlan >>>> > >>>> > >>>> > On Thu, Dec 27, 2012 at 8:18 AM, Barry Smith >>>> wrote: >>>> > >>>> > ierr = DMDACreateNaturalVector(da,&natural);CHKERRQ(ierr); >>>> > ierr = >>>> DMDAGlobalToNaturalBegin(da,xin,INSERT_VALUES,natural);CHKERRQ(ierr); >>>> > ierr = >>>> DMDAGlobalToNaturalEnd(da,xin,INSERT_VALUES,natural);CHKERRQ(ierr); >>>> > >>>> > Now do VecScatterCreateToZero() from natural and the vector will be >>>> in the natural ordering on process zero with the dof interlaced. >>>> > >>>> > >>>> > Barry >>>> > >>>> > On Dec 27, 2012, at 12:22 AM, amlan barua wrote: >>>> > >>>> > > Hi, >>>> > > Is there an analogue of VecScatterCreateToZero for DA vectors? The >>>> DMDA object has more than one degrees of freedom. >>>> > > If there isn't any, should I use an IS object to do the scattering? >>>> > > Amlan >>>> > >>>> > >>>> > >>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jedbrown at mcs.anl.gov Mon Dec 31 00:06:58 2012 From: jedbrown at mcs.anl.gov (Jed Brown) Date: Mon, 31 Dec 2012 00:06:58 -0600 Subject: [petsc-users] PetscKernel_A_gets_inverse_A_ In-Reply-To: <50D42C0F.2090301@unibas.it> References: <50D42C0F.2090301@unibas.it> Message-ID: Sorry about the slow response. Do you only want the square kernels with static (compile-time) size? These are currently implemented as macros, but it's not a problem to modify them to be functions. C99 provides a well-defined mechanism to have a single version with external linkage, but also encourage inlining. I'm considering providing a set of kernels that would support non-square matrices, but the naming conventions would have to change and my applications would frequently not know the size statically (but it would typically be ~10 or less, so calling BLAS doesn't make sense). On Fri, Dec 21, 2012 at 3:29 AM, Aldo Bonfiglioli < aldo.bonfiglioli at unibas.it> wrote: > Dear all, > would it be possible to have a unified interface (also Fortran callable) > to the PetscKernel_A_gets_inverse_A_ routines? > I find them very useful within my own piece > of Fortran code to solve small dense linear system (which I have > to do very frequently). > I have my own interface, at present, but I need to > change it as needed when a new PETSc version is released. > > Regards, > Aldo > -- > Dr. Aldo Bonfiglioli > Associate professor of Fluid Flow Machinery > Scuola di Ingegneria > Universita' della Basilicata > V.le dell'Ateneo lucano, 10 85100 Potenza ITALY > tel:+39.0971.205203 fax:+39.0971.205215 > > > Publications list > -------------- next part -------------- An HTML attachment was scrubbed... URL: