From slivkaje at gmail.com  Sat Dec  1 12:05:06 2012
From: slivkaje at gmail.com (Jelena Slivka)
Date: Sat, 1 Dec 2012 13:05:06 -0500
Subject: [petsc-users] Solving A*X = B where A and B are matrices
Message-ID: <CAE-ErqCGjD-jdtxfKR__ixN26SQbT-jf2Yuq0horLNWZ_Dzgxw@mail.gmail.com>

Hello!
I am trying to solve A*X = B where A and B are matrices, and then find
trace of the resulting matrix X. My approach has been to partition matrix B
in column vectors bi and then solve each system A*xi = bi. Then, for all
vectors xi I would extract i-th element xi(i) and sum those elements in
order to get Trace(X).
Pseudo-code:
1) load matrices A and B
2) transpose matrix B (so that each right-hand side bi is in the row, as
operation MatGetColumnVector is slow)
3) set up KSPSolve
4) create vector diagonal (in which xi(i) elements will be stored)
5) for each row i of matrix B owned by current process:
          - create vector bi by extracting row i from matrix B
          - apply KSPsolve to get xi
          - insert value xi(i) in diagonal vector (only the process
which
            holds the ith value of vector x(i) should do so)
6) sum vector diagonal to get the trace.
However, my code (attached, along with the test case) runs fine on one
process, but hangs if started on multiple processes. Could you please help
me figure out what am I doing wrong?
Also, could you please tell me is it possible to use Cholesky factorization
when running on multiple processes (I see that I cannot use it when I set
the format of matrix A to MPIAIJ)?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121201/39dfd09a/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Experiment.c
Type: text/x-csrc
Size: 3789 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121201/39dfd09a/attachment.c>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Abin
Type: application/octet-stream
Size: 136 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121201/39dfd09a/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Bbin
Type: application/octet-stream
Size: 136 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121201/39dfd09a/attachment-0001.obj>

From bsmith at mcs.anl.gov  Sat Dec  1 17:03:33 2012
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Sat, 1 Dec 2012 17:03:33 -0600
Subject: [petsc-users] Solving A*X = B where A and B are matrices
In-Reply-To: <CAE-ErqCGjD-jdtxfKR__ixN26SQbT-jf2Yuq0horLNWZ_Dzgxw@mail.gmail.com>
References: <CAE-ErqCGjD-jdtxfKR__ixN26SQbT-jf2Yuq0horLNWZ_Dzgxw@mail.gmail.com>
Message-ID: <55DD94DF-150F-4917-AA26-C0680107E065@mcs.anl.gov>


    We recommend following the directions http://www.mcs.anl.gov/petsc/documentation/faq.html#schurcomplement  for computing a Schur complement; just skip the unneeded step. MUMPS supports a parallel Cholesky but you can also use a parallel LU with MUMPS, PaSTIX or SuperLU_Dist and those will work fine also. With current software Cholesky in parallel is not tons better than LU so generally not worth monkeying with.

   Barry


On Dec 1, 2012, at 12:05 PM, Jelena Slivka <slivkaje at gmail.com> wrote:

> Hello!
> I am trying to solve A*X = B where A and B are matrices, and then find trace of the resulting matrix X. My approach has been to partition matrix B in column vectors bi and then solve each system A*xi = bi. Then, for all vectors xi I would extract i-th element xi(i) and sum those elements in order to get Trace(X).
> Pseudo-code:
> 1) load matrices A and B
> 2) transpose matrix B (so that each right-hand side bi is in the row, as operation MatGetColumnVector is slow)
> 3) set up KSPSolve
> 4) create vector diagonal (in which xi(i) elements will be stored)
> 5) for each row i of matrix B owned by current process:
>           - create vector bi by extracting row i from matrix B
>           - apply KSPsolve to get xi
>           - insert value xi(i) in diagonal vector (only the process which         
>             holds the ith value of vector x(i) should do so)
> 6) sum vector diagonal to get the trace.
> However, my code (attached, along with the test case) runs fine on one process, but hangs if started on multiple processes. Could you please help me figure out what am I doing wrong?
> Also, could you please tell me is it possible to use Cholesky factorization when running on multiple processes (I see that I cannot use it when I set the format of matrix A to MPIAIJ)?
> 
> <Experiment.c><Abin><Bbin>


From w_ang_temp at 163.com  Sun Dec  2 08:45:47 2012
From: w_ang_temp at 163.com (w_ang_temp)
Date: Sun, 2 Dec 2012 22:45:47 +0800 (CST)
Subject: [petsc-users] Is there something to be paid attention to about
	MatIsSymmetric?
Message-ID: <7c30a630.9645.13b5c1487f1.Coremail.w_ang_temp@163.com>

Hello,

    I use MatIsSymmetric to know if the matrix A is symmetric.

According to my model, it should be symmetric due to the theory.

But I always get the result 'PetscBool  *flg = 0', although I

set 'tol' a large value(0.001).

    Because the matrix is of 20000 dimension, I can not output the

matrix to the txt. So I want to konw if there is something to be paid attention to

about the function 'MatIsSymmetric' in version 3.2. Or do I have some other ways

to determine the symmetry.I think symmetry is one of the most important thing

in my analysis.

    Thanks.

                                               Jim
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121202/0442bc3d/attachment.html>

From jedbrown at mcs.anl.gov  Sun Dec  2 09:10:51 2012
From: jedbrown at mcs.anl.gov (Jed Brown)
Date: Sun, 2 Dec 2012 09:10:51 -0600
Subject: [petsc-users] Is there something to be paid attention to about
	MatIsSymmetric?
In-Reply-To: <7c30a630.9645.13b5c1487f1.Coremail.w_ang_temp@163.com>
References: <7c30a630.9645.13b5c1487f1.Coremail.w_ang_temp@163.com>
Message-ID: <CAM9tzSm7u4=5sZFDR1gw3hnS_X=N1jn5j0ZFDWQOWgDFyTXFJA@mail.gmail.com>

The test for symmetry is not implemented for all matrix types. Looking at
the code, it seems to only be SeqAIJ, but MatIsTranspose(A,A,...) would
also work for MPIAIJ.


On Sun, Dec 2, 2012 at 8:45 AM, w_ang_temp <w_ang_temp at 163.com> wrote:

> Hello,
>
>     I use MatIsSymmetric to know if the matrix A is symmetric.
>
> According to my model, it should be symmetric due to the theory.
>
> But I always get the result 'PetscBool  *flg = 0', although I
>
> set 'tol' a large value(0.001).
>
>     Because the matrix is of 20000 dimension, I can not output the
>
> matrix to the txt. So I want to konw if there is something to be paid
> attention to
>
> about the function 'MatIsSymmetric' in version 3.2. Or do I have some
> other ways
>
> to determine the symmetry.I think symmetry is one of the most important
> thing
>
> in my analysis.
>
>     Thanks.
>
>                                                Jim
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121202/34fa9226/attachment.html>

From w_ang_temp at 163.com  Sun Dec  2 12:09:06 2012
From: w_ang_temp at 163.com (w_ang_temp)
Date: Mon, 3 Dec 2012 02:09:06 +0800 (CST)
Subject: [petsc-users] Is there something to be paid attention to about
 MatIsSymmetric?
In-Reply-To: <CAM9tzSm7u4=5sZFDR1gw3hnS_X=N1jn5j0ZFDWQOWgDFyTXFJA@mail.gmail.com>
References: <7c30a630.9645.13b5c1487f1.Coremail.w_ang_temp@163.com>
	<CAM9tzSm7u4=5sZFDR1gw3hnS_X=N1jn5j0ZFDWQOWgDFyTXFJA@mail.gmail.com>
Message-ID: <6113982e.243.13b5ccead36.Coremail.w_ang_temp@163.com>


Maybe the matrix in my project is true unsymmetric. I use MatIsTranspose and get
the same result. Maybe I need to check my constitutive model.




>At 2012-12-02 23:10:51,"Jed Brown" <jedbrown at mcs.anl.gov> wrote:
>The test for symmetry is not implemented for all matrix types. Looking at the code, it seems to only be SeqAIJ, but MatIsTranspose(A,A,...) would also work >for MPIAIJ.



>>On Sun, Dec 2, 2012 at 8:45 AM, w_ang_temp <w_ang_temp at 163.com> wrote:


>>Hello,

>>    I use MatIsSymmetric to know if the matrix A is symmetric.

>>According to my model, it should be symmetric due to the theory.

>>But I always get the result 'PetscBool  *flg = 0', although I

>>set 'tol' a large value(0.001).

>>    Because the matrix is of 20000 dimension, I can not output the

>>matrix to the txt. So I want to konw if there is something to be paid attention to

>>about the function 'MatIsSymmetric' in version 3.2. Or do I have some other ways

>>to determine the symmetry.I think symmetry is one of the most important thing

>>in my analysis.

>>    Thanks.

>>                                               Jim





-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121203/1f8e01f9/attachment.html>

From jedbrown at mcs.anl.gov  Sun Dec  2 12:18:23 2012
From: jedbrown at mcs.anl.gov (Jed Brown)
Date: Sun, 2 Dec 2012 12:18:23 -0600
Subject: [petsc-users] Is there something to be paid attention to about
	MatIsSymmetric?
In-Reply-To: <6113982e.243.13b5ccead36.Coremail.w_ang_temp@163.com>
References: <7c30a630.9645.13b5c1487f1.Coremail.w_ang_temp@163.com>
	<CAM9tzSm7u4=5sZFDR1gw3hnS_X=N1jn5j0ZFDWQOWgDFyTXFJA@mail.gmail.com>
	<6113982e.243.13b5ccead36.Coremail.w_ang_temp@163.com>
Message-ID: <CAM9tzS=DJVhKW2+uXaZKDx0zzCnn-RLTJiGuZfPdUSN63AAiuw@mail.gmail.com>

Check boundary conditions. For debugging, do MatTranspose() followed by
MatAXPY() to see the difference A - A^T.

On Sun, Dec 2, 2012 at 12:09 PM, w_ang_temp <w_ang_temp at 163.com> wrote:

> Maybe the matrix in my project is true unsymmetric. I use MatIsTranspose
> and get
> the same result. Maybe I need to check my constitutive model.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121202/f5b9d40c/attachment.html>

From agrayver at gfz-potsdam.de  Mon Dec  3 06:37:30 2012
From: agrayver at gfz-potsdam.de (Alexander Grayver)
Date: Mon, 03 Dec 2012 13:37:30 +0100
Subject: [petsc-users] valgrind complains about string functions
Message-ID: <50BC9D0A.2040803@gfz-potsdam.de>

Hello,

I'm using PETSc-3.3-p4 compiled with ICC 12.0 + IntelMPI 4.0.3 and 
getting a bunch of the errors related to the string functions:

==22020== Conditional jump or move depends on uninitialised value(s)
==22020==    at 0x4D3109: __intel_sse2_strcpy (in /home/main)
==22020==    by 0xE87D51D: PetscStrcpy (in 
/home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so)
==22020==    by 0xE87B6A4: PetscStrallocpy (in 
/home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so)
==22020==    by 0xE796769: PetscFListGetPathAndFunction (in 
/home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so)
==22020==    by 0xE79652A: PetscFListAdd (in 
/home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so)
==22020==    by 0xE64ACB8: MatMFFDRegister (in 
/home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so)
==22020==    by 0xE64FA7D: MatMFFDRegisterAll (in 
/home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so)
==22020==    by 0xE64F65B: MatMFFDInitializePackage (in 
/home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so)
==22020==    by 0xE48D8C2: MatInitializePackage (in 
/home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so)
==22020==    by 0xE5157DB: MatCreate (in 
/home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so)
==22020==    by 0xE29A74C: MatCreateSeqAIJ (in 
/home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so)

Same thing for PetscStrncat etc.

There was similar question two years ago in this mailing list and advice 
was to use a different compiler. It is not an option for me.
Thus, my question is can those errors potentially cause any serious 
troubles? I came across with time trying to debug a weird segmentation 
fault.

Thanks.

-- 
Regards,
Alexander


From tim.gallagher at gatech.edu  Mon Dec  3 07:47:45 2012
From: tim.gallagher at gatech.edu (Tim Gallagher)
Date: Mon, 3 Dec 2012 08:47:45 -0500 (EST)
Subject: [petsc-users] valgrind complains about string functions
In-Reply-To: <50BC9D0A.2040803@gfz-potsdam.de>
Message-ID: <1358153325.5949105.1354542465447.JavaMail.root@mail.gatech.edu>

This is a known bug in Valgrind and aside from being annoying and making it darn near impossible to find real problems, there's nothing that can be done about it.

Tim

----- Original Message -----
From: "Alexander Grayver" <agrayver at gfz-potsdam.de>
To: "PETSc users list" <petsc-users at mcs.anl.gov>
Sent: Monday, December 3, 2012 7:37:30 AM
Subject: [petsc-users] valgrind complains about string functions

Hello,

I'm using PETSc-3.3-p4 compiled with ICC 12.0 + IntelMPI 4.0.3 and 
getting a bunch of the errors related to the string functions:

==22020== Conditional jump or move depends on uninitialised value(s)
==22020==    at 0x4D3109: __intel_sse2_strcpy (in /home/main)
==22020==    by 0xE87D51D: PetscStrcpy (in 
/home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so)
==22020==    by 0xE87B6A4: PetscStrallocpy (in 
/home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so)
==22020==    by 0xE796769: PetscFListGetPathAndFunction (in 
/home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so)
==22020==    by 0xE79652A: PetscFListAdd (in 
/home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so)
==22020==    by 0xE64ACB8: MatMFFDRegister (in 
/home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so)
==22020==    by 0xE64FA7D: MatMFFDRegisterAll (in 
/home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so)
==22020==    by 0xE64F65B: MatMFFDInitializePackage (in 
/home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so)
==22020==    by 0xE48D8C2: MatInitializePackage (in 
/home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so)
==22020==    by 0xE5157DB: MatCreate (in 
/home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so)
==22020==    by 0xE29A74C: MatCreateSeqAIJ (in 
/home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so)

Same thing for PetscStrncat etc.

There was similar question two years ago in this mailing list and advice 
was to use a different compiler. It is not an option for me.
Thus, my question is can those errors potentially cause any serious 
troubles? I came across with time trying to debug a weird segmentation 
fault.

Thanks.

-- 
Regards,
Alexander


From jedbrown at mcs.anl.gov  Mon Dec  3 09:51:26 2012
From: jedbrown at mcs.anl.gov (Jed Brown)
Date: Mon, 3 Dec 2012 07:51:26 -0800
Subject: [petsc-users] valgrind complains about string functions
In-Reply-To: <1358153325.5949105.1354542465447.JavaMail.root@mail.gatech.edu>
References: <50BC9D0A.2040803@gfz-potsdam.de>
	<1358153325.5949105.1354542465447.JavaMail.root@mail.gatech.edu>
Message-ID: <CAM9tzSm3hyxJBpAqtEbmYC0vyNpNwdwVt=u=q7oeDk9sAoCGmA@mail.gmail.com>

Specifically, Intel's vectorized string routines are reading partway into
uninitialized memory, branching on the result, but doing so in a way that
makes the result independent of what was there (assuming null-terminated
string). You can make a Valgrind suppression for it.


On Mon, Dec 3, 2012 at 5:47 AM, Tim Gallagher <tim.gallagher at gatech.edu>wrote:

> This is a known bug in Valgrind and aside from being annoying and making
> it darn near impossible to find real problems, there's nothing that can be
> done about it.
>
> Tim
>
> ----- Original Message -----
> From: "Alexander Grayver" <agrayver at gfz-potsdam.de>
> To: "PETSc users list" <petsc-users at mcs.anl.gov>
> Sent: Monday, December 3, 2012 7:37:30 AM
> Subject: [petsc-users] valgrind complains about string functions
>
> Hello,
>
> I'm using PETSc-3.3-p4 compiled with ICC 12.0 + IntelMPI 4.0.3 and
> getting a bunch of the errors related to the string functions:
>
> ==22020== Conditional jump or move depends on uninitialised value(s)
> ==22020==    at 0x4D3109: __intel_sse2_strcpy (in /home/main)
> ==22020==    by 0xE87D51D: PetscStrcpy (in
>
> /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so)
> ==22020==    by 0xE87B6A4: PetscStrallocpy (in
>
> /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so)
> ==22020==    by 0xE796769: PetscFListGetPathAndFunction (in
>
> /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so)
> ==22020==    by 0xE79652A: PetscFListAdd (in
>
> /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so)
> ==22020==    by 0xE64ACB8: MatMFFDRegister (in
>
> /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so)
> ==22020==    by 0xE64FA7D: MatMFFDRegisterAll (in
>
> /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so)
> ==22020==    by 0xE64F65B: MatMFFDInitializePackage (in
>
> /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so)
> ==22020==    by 0xE48D8C2: MatInitializePackage (in
>
> /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so)
> ==22020==    by 0xE5157DB: MatCreate (in
>
> /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so)
> ==22020==    by 0xE29A74C: MatCreateSeqAIJ (in
>
> /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so)
>
> Same thing for PetscStrncat etc.
>
> There was similar question two years ago in this mailing list and advice
> was to use a different compiler. It is not an option for me.
> Thus, my question is can those errors potentially cause any serious
> troubles? I came across with time trying to debug a weird segmentation
> fault.
>
> Thanks.
>
> --
> Regards,
> Alexander
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121203/8c710d98/attachment.html>

From balay at mcs.anl.gov  Mon Dec  3 10:53:28 2012
From: balay at mcs.anl.gov (Satish Balay)
Date: Mon, 3 Dec 2012 10:53:28 -0600 (CST)
Subject: [petsc-users] valgrind complains about string functions
In-Reply-To: <50BC9D0A.2040803@gfz-potsdam.de>
References: <50BC9D0A.2040803@gfz-potsdam.de>
Message-ID: <alpine.LFD.2.02.1212031049340.1761@asterix>

On Mon, 3 Dec 2012, Alexander Grayver wrote:

> Hello,
> 
> I'm using PETSc-3.3-p4 compiled with ICC 12.0 + IntelMPI 4.0.3 and getting a
> bunch of the errors related to the string functions:
> 
> ==22020== Conditional jump or move depends on uninitialised value(s)
> ==22020==    at 0x4D3109: __intel_sse2_strcpy (in /home/main)
> ==22020==    by 0xE87D51D: PetscStrcpy (in
> /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so)
> ==22020==    by 0xE87B6A4: PetscStrallocpy (in
> /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so)
> ==22020==    by 0xE796769: PetscFListGetPathAndFunction (in
> /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so)
> ==22020==    by 0xE79652A: PetscFListAdd (in
> /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so)
> ==22020==    by 0xE64ACB8: MatMFFDRegister (in
> /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so)
> ==22020==    by 0xE64FA7D: MatMFFDRegisterAll (in
> /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so)
> ==22020==    by 0xE64F65B: MatMFFDInitializePackage (in
> /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so)
> ==22020==    by 0xE48D8C2: MatInitializePackage (in
> /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so)
> ==22020==    by 0xE5157DB: MatCreate (in
> /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so)
> ==22020==    by 0xE29A74C: MatCreateSeqAIJ (in
> /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so)
> 
> Same thing for PetscStrncat etc.
> 
> There was similar question two years ago in this mailing list and advice was
> to use a different compiler. It is not an option for me.

You can always use a separate build of PETSc with gcc,--download-mpich
to get a valgrind clean build [for debugging purposes]

> Thus, my question is can those errors potentially cause any serious troubles?

Generally we can ignore issues valgrind finds in system/compiler
libraries. [Jed has a valid explanation for this one].

And generally valgrind provides 'default suppression files' for known
glibc versions. But for such issues as with ifc, you can ask valgrind
to create a supression file - and then rerun valgrind with this custom
supression file - to get more readable output.

Satish

> I came across with time trying to debug a weird segmentation fault.
> 
> Thanks.
> 
> 


From fande.kong at colorado.edu  Mon Dec  3 12:38:18 2012
From: fande.kong at colorado.edu (Fande Kong)
Date: Mon, 3 Dec 2012 11:38:18 -0700
Subject: [petsc-users] Can anyone guess the possible reason of the following
	errors?
Message-ID: <CAN5Wd-+DR986+mBLmDxtvY6L5Rzh2jgZUBe0i5R_edmoBpjazQ@mail.gmail.com>

Hi all,

Can anyone guess the possible reason of the following errors:


[0]PETSC ERROR: PetscGatherMessageLengths() line 133 in
src/sys/utils/mpimesg.c
[0]PETSC ERROR: VecScatterCreate_PtoP() line 2188 in
src/vec/vec/utils/vpscat.c
[0]PETSC ERROR: VecScatterCreate() line 1431 in src/vec/vec/utils/vscat.c


I have been working for several days to figure out the reason, but now I
still get nothing.  I use Petsc-3.3-p3 based on the mvapich2-1.6. I tried
to use vecscatter to distribute the mesh. When the mesh was small,
everything was ok. But when the mesh became larger about 14,000,000
elements, I got the above errors.

-- 
Fande Kong
Department of Computer Science
University of Colorado at Boulder
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121203/6ab54707/attachment.html>

From knepley at gmail.com  Mon Dec  3 12:41:16 2012
From: knepley at gmail.com (Matthew Knepley)
Date: Mon, 3 Dec 2012 12:41:16 -0600
Subject: [petsc-users] Can anyone guess the possible reason of the
 following errors?
In-Reply-To: <CAN5Wd-+DR986+mBLmDxtvY6L5Rzh2jgZUBe0i5R_edmoBpjazQ@mail.gmail.com>
References: <CAN5Wd-+DR986+mBLmDxtvY6L5Rzh2jgZUBe0i5R_edmoBpjazQ@mail.gmail.com>
Message-ID: <CAMYG4GnSx5gR6-ZRjze7CWrhriYP5CJ_1MBXZRJuS1fZkhX7_Q@mail.gmail.com>

On Mon, Dec 3, 2012 at 12:38 PM, Fande Kong <fande.kong at colorado.edu> wrote:
> Hi all,
>
> Can anyone guess the possible reason of the following errors:
>
>
> [0]PETSC ERROR: PetscGatherMessageLengths() line 133 in
> src/sys/utils/mpimesg.c
> [0]PETSC ERROR: VecScatterCreate_PtoP() line 2188 in
> src/vec/vec/utils/vpscat.c
> [0]PETSC ERROR: VecScatterCreate() line 1431 in src/vec/vec/utils/vscat.c

Partial error messages are generally not helpful.

   Matt

> I have been working for several days to figure out the reason, but now I
> still get nothing.  I use Petsc-3.3-p3 based on the mvapich2-1.6. I tried to
> use vecscatter to distribute the mesh. When the mesh was small, everything
> was ok. But when the mesh became larger about 14,000,000 elements, I got the
> above errors.
>
> --
> Fande Kong
> Department of Computer Science
> University of Colorado at Boulder
>
>



--
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener

From slivkaje at gmail.com  Mon Dec  3 13:08:54 2012
From: slivkaje at gmail.com (Jelena Slivka)
Date: Mon, 3 Dec 2012 14:08:54 -0500
Subject: [petsc-users] Solving A*X = B where A and B are matrices
In-Reply-To: <55DD94DF-150F-4917-AA26-C0680107E065@mcs.anl.gov>
References: <CAE-ErqCGjD-jdtxfKR__ixN26SQbT-jf2Yuq0horLNWZ_Dzgxw@mail.gmail.com>
	<55DD94DF-150F-4917-AA26-C0680107E065@mcs.anl.gov>
Message-ID: <CAE-ErqD-M0uvUdtk-Ook0zKFA0XgiL2+kO5+Js1u7=gn_nBb3w@mail.gmail.com>

Thank you very much!
However, I have another question. I have a cluster of 4 nodes and each node
has 6 cores. If I run my code using 6 cores on one node (using the command
"mpiexec -n 6") it is much faster than running it on just one process
(which is expected). However, if I try running the code on multiple nodes
(using "mpiexec -f machinefile -ppn 4", where machinefile is the file which
contains the node names), it runs much slower than on just one process.
This also happens with tutorial examples. I have checked the number of
iteration for KSP solver when spread on multiple processors and it doesn't
seem to be the problem. Do you have any suggestions on what am I doing
wrong? Are the commands I am using wrong?


On Sat, Dec 1, 2012 at 6:03 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:

>
>     We recommend following the directions
> http://www.mcs.anl.gov/petsc/documentation/faq.html#schurcomplement  for
> computing a Schur complement; just skip the unneeded step. MUMPS supports a
> parallel Cholesky but you can also use a parallel LU with MUMPS, PaSTIX or
> SuperLU_Dist and those will work fine also. With current software Cholesky
> in parallel is not tons better than LU so generally not worth monkeying
> with.
>
>    Barry
>
>
> On Dec 1, 2012, at 12:05 PM, Jelena Slivka <slivkaje at gmail.com> wrote:
>
> > Hello!
> > I am trying to solve A*X = B where A and B are matrices, and then find
> trace of the resulting matrix X. My approach has been to partition matrix B
> in column vectors bi and then solve each system A*xi = bi. Then, for all
> vectors xi I would extract i-th element xi(i) and sum those elements in
> order to get Trace(X).
> > Pseudo-code:
> > 1) load matrices A and B
> > 2) transpose matrix B (so that each right-hand side bi is in the row, as
> operation MatGetColumnVector is slow)
> > 3) set up KSPSolve
> > 4) create vector diagonal (in which xi(i) elements will be stored)
> > 5) for each row i of matrix B owned by current process:
> >           - create vector bi by extracting row i from matrix B
> >           - apply KSPsolve to get xi
> >           - insert value xi(i) in diagonal vector (only the process which
> >             holds the ith value of vector x(i) should do so)
> > 6) sum vector diagonal to get the trace.
> > However, my code (attached, along with the test case) runs fine on one
> process, but hangs if started on multiple processes. Could you please help
> me figure out what am I doing wrong?
> > Also, could you please tell me is it possible to use Cholesky
> factorization when running on multiple processes (I see that I cannot use
> it when I set the format of matrix A to MPIAIJ)?
> >
> > <Experiment.c><Abin><Bbin>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121203/880d0047/attachment.html>

From fd.kong at siat.ac.cn  Mon Dec  3 13:12:53 2012
From: fd.kong at siat.ac.cn (Fande Kong)
Date: Mon, 3 Dec 2012 12:12:53 -0700
Subject: [petsc-users] Can anyone guess the possible reason of the
 following errors?
In-Reply-To: <CAMYG4GnSx5gR6-ZRjze7CWrhriYP5CJ_1MBXZRJuS1fZkhX7_Q@mail.gmail.com>
References: <CAN5Wd-+DR986+mBLmDxtvY6L5Rzh2jgZUBe0i5R_edmoBpjazQ@mail.gmail.com>
	<CAMYG4GnSx5gR6-ZRjze7CWrhriYP5CJ_1MBXZRJuS1fZkhX7_Q@mail.gmail.com>
Message-ID: <CAN5Wd-J8EiTekSt+i3CmvPnub+=xzfGUa6Gg6Byufd78aRTQpQ@mail.gmail.com>

More details for the errors:

[0]PETSC ERROR: PetscGatherMessageLengths() line 133 in
src/sys/utils/mpimesg.c
[0]PETSC ERROR: VecScatterCreate_PtoP() line 2188 in
src/vec/vec/utils/vpscat.c
[0]PETSC ERROR: VecScatterCreate() line 1431 in src/vec/vec/utils/vscat.c
[0]PETSC ERROR: SpmcsSFCreateVecScatter() line 96 in SpmcsSFComm.cpp
[0]PETSC ERROR: moveDataBetweenRootsAndLeaves() line 133 in SpmcsSFComm.cpp
[0]PETSC ERROR: SpmcsSFCreateNormalizedEmbeddedSF() line 359 in
SpmcsSFComm.cpp
[0]PETSC ERROR: SpmcsSFDistributeSection() line 343 in SpmcsSection.cpp
[0]PETSC ERROR: SpmcsMeshDistribute() line 444 in distributeMesh.cpp
[0]PETSC ERROR: DMmeshInitialize() line 32 in mgInitialize.cpp
[0]PETSC ERROR: main() line 64 in linearElasticity3d.cpp
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0

=====================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   EXIT CODE: 256
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
=====================================================================================
[proxy:0:1 at node1778] HYD_pmcd_pmip_control_cmd_cb
(./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
[proxy:0:1 at node1778] HYDT_dmxu_poll_wait_for_event
(./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:1 at node1778] main (./pm/pmiserv/pmip.c:214): demux engine error
waiting for event
[proxy:0:2 at node1777] HYD_pmcd_pmip_control_cmd_cb
(./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
[proxy:0:2 at node1777] HYDT_dmxu_poll_wait_for_event
(./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:2 at node1777] main (./pm/pmiserv/pmip.c:214): demux engine error
waiting for event
[proxy:0:3 at node1773] HYD_pmcd_pmip_control_cmd_cb
(./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
[proxy:0:3 at node1773] HYDT_dmxu_poll_wait_for_event
(./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:3 at node1773] main (./pm/pmiserv/pmip.c:214): demux engine error
waiting for event
[proxy:0:4 at node1770] HYD_pmcd_pmip_control_cmd_cb
(./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
[proxy:0:4 at node1770] HYDT_dmxu_poll_wait_for_event
(./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:4 at node1770] main (./pm/pmiserv/pmip.c:214): demux engine error
waiting for event
[proxy:0:6 at node1760] HYD_pmcd_pmip_control_cmd_cb
(./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
[proxy:0:6 at node1760] HYDT_dmxu_poll_wait_for_event
(./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:6 at node1760] main (./pm/pmiserv/pmip.c:214): demux engine error
waiting for event
[proxy:0:7 at node1758] HYD_pmcd_pmip_control_cmd_cb
(./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
[proxy:0:7 at node1758] HYDT_dmxu_poll_wait_for_event
(./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:7 at node1758] main (./pm/pmiserv/pmip.c:214): demux engine error
waiting for event
[proxy:0:8 at node1738] HYD_pmcd_pmip_control_cmd_cb
(./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
[proxy:0:8 at node1738] HYDT_dmxu_poll_wait_for_event
(./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:8 at node1738] main (./pm/pmiserv/pmip.c:214): demux engine error
waiting for event
[proxy:0:9 at node1736] HYD_pmcd_pmip_control_cmd_cb
(./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
[proxy:0:9 at node1736] HYDT_dmxu_poll_wait_for_event
(./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:9 at node1736] main (./pm/pmiserv/pmip.c:214): demux engine error
waiting for event
[proxy:0:10 at node1668] HYD_pmcd_pmip_control_cmd_cb
(./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
[proxy:0:10 at node1668] HYDT_dmxu_poll_wait_for_event
(./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:10 at node1668] main (./pm/pmiserv/pmip.c:214): demux engine error
waiting for event
[proxy:0:11 at node1667] HYD_pmcd_pmip_control_cmd_cb
(./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
[proxy:0:11 at node1667] HYDT_dmxu_poll_wait_for_event
(./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:11 at node1667] main (./pm/pmiserv/pmip.c:214): demux engine error
waiting for event
[proxy:0:12 at node1658] HYD_pmcd_pmip_control_cmd_cb
(./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
[proxy:0:12 at node1658] HYDT_dmxu_poll_wait_for_event
(./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:12 at node1658] main (./pm/pmiserv/pmip.c:214): demux engine error
waiting for event
[proxy:0:13 at node1656] HYD_pmcd_pmip_control_cmd_cb
(./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
[proxy:0:13 at node1656] HYDT_dmxu_poll_wait_for_event
(./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:13 at node1656] main (./pm/pmiserv/pmip.c:214): demux engine error
waiting for event
[proxy:0:14 at node1637] HYD_pmcd_pmip_control_cmd_cb
(./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
[proxy:0:14 at node1637] HYDT_dmxu_poll_wait_for_event
(./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:14 at node1637] main (./pm/pmiserv/pmip.c:214): demux engine error
waiting for event
[proxy:0:15 at node1636] HYD_pmcd_pmip_control_cmd_cb
(./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
[proxy:0:15 at node1636] HYDT_dmxu_poll_wait_for_event
(./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:15 at node1636] main (./pm/pmiserv/pmip.c:214): demux engine error
waiting for event
[proxy:0:16 at node1611] HYD_pmcd_pmip_control_cmd_cb
(./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
[proxy:0:16 at node1611] HYDT_dmxu_poll_wait_for_event
(./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:16 at node1611] main (./pm/pmiserv/pmip.c:214): demux engine error
waiting for event
[proxy:0:17 at node1380] HYD_pmcd_pmip_control_cmd_cb
(./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
[proxy:0:17 at node1380] HYDT_dmxu_poll_wait_for_event
(./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:17 at node1380] main (./pm/pmiserv/pmip.c:214): demux engine error
waiting for event
[proxy:0:18 at node1379] HYD_pmcd_pmip_control_cmd_cb
(./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
[proxy:0:18 at node1379] HYDT_dmxu_poll_wait_for_event
(./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:18 at node1379] main (./pm/pmiserv/pmip.c:214): demux engine error
waiting for event
[proxy:0:19 at node1378] HYD_pmcd_pmip_control_cmd_cb
(./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
[proxy:0:19 at node1378] HYDT_dmxu_poll_wait_for_event
(./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:19 at node1378] main (./pm/pmiserv/pmip.c:214): demux engine error
waiting for event
[proxy:0:20 at node1377] HYD_pmcd_pmip_control_cmd_cb
(./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
[proxy:0:20 at node1377] HYDT_dmxu_poll_wait_for_event
(./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:20 at node1377] main (./pm/pmiserv/pmip.c:214): demux engine error
waiting for event
[proxy:0:21 at node1376] HYD_pmcd_pmip_control_cmd_cb
(./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
[proxy:0:21 at node1376] HYDT_dmxu_poll_wait_for_event
(./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:21 at node1376] main (./pm/pmiserv/pmip.c:214): demux engine error
waiting for event
[proxy:0:22 at node1375] HYD_pmcd_pmip_control_cmd_cb
(./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
[proxy:0:22 at node1375] HYDT_dmxu_poll_wait_for_event
(./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:22 at node1375] main (./pm/pmiserv/pmip.c:214): demux engine error
waiting for event
[proxy:0:23 at node1374] HYD_pmcd_pmip_control_cmd_cb
(./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
[proxy:0:23 at node1374] HYDT_dmxu_poll_wait_for_event
(./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:23 at node1374] main (./pm/pmiserv/pmip.c:214): demux engine error
waiting for event
[proxy:0:24 at node1373] HYD_pmcd_pmip_control_cmd_cb
(./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
[proxy:0:24 at node1373] HYDT_dmxu_poll_wait_for_event
(./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:24 at node1373] main (./pm/pmiserv/pmip.c:214): demux engine error
waiting for event
[proxy:0:25 at node1372] HYD_pmcd_pmip_control_cmd_cb
(./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
[proxy:0:25 at node1372] HYDT_dmxu_poll_wait_for_event
(./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:25 at node1372] main (./pm/pmiserv/pmip.c:214): demux engine error
waiting for event
[proxy:0:26 at node1371] HYD_pmcd_pmip_control_cmd_cb
(./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
[proxy:0:26 at node1371] HYDT_dmxu_poll_wait_for_event
(./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:26 at node1371] main (./pm/pmiserv/pmip.c:214): demux engine error
waiting for event
[proxy:0:27 at node1370] HYD_pmcd_pmip_control_cmd_cb
(./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
[proxy:0:27 at node1370] HYDT_dmxu_poll_wait_for_event
(./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:27 at node1370] main (./pm/pmiserv/pmip.c:214): demux engine error
waiting for event
[proxy:0:28 at node1369] HYD_pmcd_pmip_control_cmd_cb
(./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
[proxy:0:28 at node1369] HYDT_dmxu_poll_wait_for_event
(./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:28 at node1369] main (./pm/pmiserv/pmip.c:214): demux engine error
waiting for event
[proxy:0:29 at node1368] HYD_pmcd_pmip_control_cmd_cb
(./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
[proxy:0:29 at node1368] HYDT_dmxu_poll_wait_for_event
(./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:29 at node1368] main (./pm/pmiserv/pmip.c:214): demux engine error
waiting for event
[proxy:0:30 at node1367] HYD_pmcd_pmip_control_cmd_cb
(./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
[proxy:0:30 at node1367] HYDT_dmxu_poll_wait_for_event
(./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:30 at node1367] main (./pm/pmiserv/pmip.c:214): demux engine error
waiting for event
[proxy:0:31 at node1366] HYD_pmcd_pmip_control_cmd_cb
(./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
[proxy:0:31 at node1366] HYDT_dmxu_poll_wait_for_event
(./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:31 at node1366] main (./pm/pmiserv/pmip.c:214): demux engine error
waiting for event
[mpiexec at node1780] HYDT_bscu_wait_for_completion
(./tools/bootstrap/utils/bscu_wait.c:70): one of the processes terminated
badly; aborting
[mpiexec at node1780] HYDT_bsci_wait_for_completion
(./tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for
completion
[mpiexec at node1780] HYD_pmci_wait_for_completion
(./pm/pmiserv/pmiserv_pmci.c:199): launcher returned error waiting for
completion
[mpiexec at node1780] main (./ui/mpich/mpiexec.c:385): process manager error
waiting for completion

It seems nothing.

On Mon, Dec 3, 2012 at 11:41 AM, Matthew Knepley <knepley at gmail.com> wrote:

> On Mon, Dec 3, 2012 at 12:38 PM, Fande Kong <fande.kong at colorado.edu>
> wrote:
> > Hi all,
> >
> > Can anyone guess the possible reason of the following errors:
> >
> >
> > [0]PETSC ERROR: PetscGatherMessageLengths() line 133 in
> > src/sys/utils/mpimesg.c
> > [0]PETSC ERROR: VecScatterCreate_PtoP() line 2188 in
> > src/vec/vec/utils/vpscat.c
> > [0]PETSC ERROR: VecScatterCreate() line 1431 in src/vec/vec/utils/vscat.c
>
> Partial error messages are generally not helpful.
>
>    Matt
>
> > I have been working for several days to figure out the reason, but now I
> > still get nothing.  I use Petsc-3.3-p3 based on the mvapich2-1.6. I
> tried to
> > use vecscatter to distribute the mesh. When the mesh was small,
> everything
> > was ok. But when the mesh became larger about 14,000,000 elements, I got
> the
> > above errors.
> >
> > --
> > Fande Kong
> > Department of Computer Science
> > University of Colorado at Boulder
> >
> >
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which
> their experiments lead.
> -- Norbert Wiener
>
>


-- 
Fande Kong
ShenZhen Institutes of Advanced Technology
Chinese Academy of Sciences
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121203/dce319b8/attachment-0001.html>

From bsmith at mcs.anl.gov  Mon Dec  3 13:19:12 2012
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Mon, 3 Dec 2012 13:19:12 -0600
Subject: [petsc-users] Can anyone guess the possible reason of the
	following errors?
In-Reply-To: <CAN5Wd-J8EiTekSt+i3CmvPnub+=xzfGUa6Gg6Byufd78aRTQpQ@mail.gmail.com>
References: <CAN5Wd-+DR986+mBLmDxtvY6L5Rzh2jgZUBe0i5R_edmoBpjazQ@mail.gmail.com>
	<CAMYG4GnSx5gR6-ZRjze7CWrhriYP5CJ_1MBXZRJuS1fZkhX7_Q@mail.gmail.com>
	<CAN5Wd-J8EiTekSt+i3CmvPnub+=xzfGUa6Gg6Byufd78aRTQpQ@mail.gmail.com>
Message-ID: <3416FED3-493A-42E6-83BB-EB661E69A90B@mcs.anl.gov>


  Perhaps some bad data is being passed into VecScatterCreate(). I would suggest having SpmcsSFCreateVecScatter
 validate the IS's and Vecs being passed in. For example, do the IS have tons of duplicates, how long are they etc?

   Barry

On Dec 3, 2012, at 1:12 PM, Fande Kong <fd.kong at siat.ac.cn> wrote:

> More details for the errors:
> 
> [0]PETSC ERROR: PetscGatherMessageLengths() line 133 in src/sys/utils/mpimesg.c
> [0]PETSC ERROR: VecScatterCreate_PtoP() line 2188 in src/vec/vec/utils/vpscat.c
> [0]PETSC ERROR: VecScatterCreate() line 1431 in src/vec/vec/utils/vscat.c
> [0]PETSC ERROR: SpmcsSFCreateVecScatter() line 96 in SpmcsSFComm.cpp
> [0]PETSC ERROR: moveDataBetweenRootsAndLeaves() line 133 in SpmcsSFComm.cpp
> [0]PETSC ERROR: SpmcsSFCreateNormalizedEmbeddedSF() line 359 in SpmcsSFComm.cpp
> [0]PETSC ERROR: SpmcsSFDistributeSection() line 343 in SpmcsSection.cpp
> [0]PETSC ERROR: SpmcsMeshDistribute() line 444 in distributeMesh.cpp
> [0]PETSC ERROR: DMmeshInitialize() line 32 in mgInitialize.cpp
> [0]PETSC ERROR: main() line 64 in linearElasticity3d.cpp
> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
> 
> =====================================================================================
> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> =   EXIT CODE: 256
> =   CLEANING UP REMAINING PROCESSES
> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
> =====================================================================================
> [proxy:0:1 at node1778] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> [proxy:0:1 at node1778] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> [proxy:0:1 at node1778] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event
> [proxy:0:2 at node1777] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> [proxy:0:2 at node1777] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> [proxy:0:2 at node1777] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event
> [proxy:0:3 at node1773] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> [proxy:0:3 at node1773] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> [proxy:0:3 at node1773] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event
> [proxy:0:4 at node1770] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> [proxy:0:4 at node1770] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> [proxy:0:4 at node1770] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event
> [proxy:0:6 at node1760] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> [proxy:0:6 at node1760] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> [proxy:0:6 at node1760] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event
> [proxy:0:7 at node1758] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> [proxy:0:7 at node1758] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> [proxy:0:7 at node1758] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event
> [proxy:0:8 at node1738] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> [proxy:0:8 at node1738] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> [proxy:0:8 at node1738] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event
> [proxy:0:9 at node1736] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> [proxy:0:9 at node1736] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> [proxy:0:9 at node1736] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event
> [proxy:0:10 at node1668] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> [proxy:0:10 at node1668] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> [proxy:0:10 at node1668] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event
> [proxy:0:11 at node1667] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> [proxy:0:11 at node1667] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> [proxy:0:11 at node1667] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event
> [proxy:0:12 at node1658] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> [proxy:0:12 at node1658] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> [proxy:0:12 at node1658] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event
> [proxy:0:13 at node1656] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> [proxy:0:13 at node1656] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> [proxy:0:13 at node1656] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event
> [proxy:0:14 at node1637] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> [proxy:0:14 at node1637] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> [proxy:0:14 at node1637] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event
> [proxy:0:15 at node1636] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> [proxy:0:15 at node1636] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> [proxy:0:15 at node1636] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event
> [proxy:0:16 at node1611] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> [proxy:0:16 at node1611] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> [proxy:0:16 at node1611] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event
> [proxy:0:17 at node1380] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> [proxy:0:17 at node1380] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> [proxy:0:17 at node1380] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event
> [proxy:0:18 at node1379] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> [proxy:0:18 at node1379] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> [proxy:0:18 at node1379] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event
> [proxy:0:19 at node1378] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> [proxy:0:19 at node1378] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> [proxy:0:19 at node1378] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event
> [proxy:0:20 at node1377] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> [proxy:0:20 at node1377] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> [proxy:0:20 at node1377] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event
> [proxy:0:21 at node1376] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> [proxy:0:21 at node1376] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> [proxy:0:21 at node1376] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event
> [proxy:0:22 at node1375] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> [proxy:0:22 at node1375] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> [proxy:0:22 at node1375] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event
> [proxy:0:23 at node1374] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> [proxy:0:23 at node1374] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> [proxy:0:23 at node1374] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event
> [proxy:0:24 at node1373] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> [proxy:0:24 at node1373] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> [proxy:0:24 at node1373] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event
> [proxy:0:25 at node1372] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> [proxy:0:25 at node1372] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> [proxy:0:25 at node1372] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event
> [proxy:0:26 at node1371] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> [proxy:0:26 at node1371] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> [proxy:0:26 at node1371] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event
> [proxy:0:27 at node1370] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> [proxy:0:27 at node1370] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> [proxy:0:27 at node1370] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event
> [proxy:0:28 at node1369] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> [proxy:0:28 at node1369] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> [proxy:0:28 at node1369] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event
> [proxy:0:29 at node1368] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> [proxy:0:29 at node1368] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> [proxy:0:29 at node1368] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event
> [proxy:0:30 at node1367] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> [proxy:0:30 at node1367] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> [proxy:0:30 at node1367] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event
> [proxy:0:31 at node1366] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> [proxy:0:31 at node1366] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> [proxy:0:31 at node1366] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event
> [mpiexec at node1780] HYDT_bscu_wait_for_completion (./tools/bootstrap/utils/bscu_wait.c:70): one of the processes terminated badly; aborting
> [mpiexec at node1780] HYDT_bsci_wait_for_completion (./tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for completion
> [mpiexec at node1780] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:199): launcher returned error waiting for completion
> [mpiexec at node1780] main (./ui/mpich/mpiexec.c:385): process manager error waiting for completion
> 
> It seems nothing. 
> 
> On Mon, Dec 3, 2012 at 11:41 AM, Matthew Knepley <knepley at gmail.com> wrote:
> On Mon, Dec 3, 2012 at 12:38 PM, Fande Kong <fande.kong at colorado.edu> wrote:
> > Hi all,
> >
> > Can anyone guess the possible reason of the following errors:
> >
> >
> > [0]PETSC ERROR: PetscGatherMessageLengths() line 133 in
> > src/sys/utils/mpimesg.c
> > [0]PETSC ERROR: VecScatterCreate_PtoP() line 2188 in
> > src/vec/vec/utils/vpscat.c
> > [0]PETSC ERROR: VecScatterCreate() line 1431 in src/vec/vec/utils/vscat.c
> 
> Partial error messages are generally not helpful.
> 
>    Matt
> 
> > I have been working for several days to figure out the reason, but now I
> > still get nothing.  I use Petsc-3.3-p3 based on the mvapich2-1.6. I tried to
> > use vecscatter to distribute the mesh. When the mesh was small, everything
> > was ok. But when the mesh became larger about 14,000,000 elements, I got the
> > above errors.
> >
> > --
> > Fande Kong
> > Department of Computer Science
> > University of Colorado at Boulder
> >
> >
> 
> 
> 
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which
> their experiments lead.
> -- Norbert Wiener
> 
> 
> 
> 
> -- 
> Fande Kong
> ShenZhen Institutes of Advanced Technology
> Chinese Academy of Sciences
> 


From knepley at gmail.com  Mon Dec  3 13:20:54 2012
From: knepley at gmail.com (Matthew Knepley)
Date: Mon, 3 Dec 2012 13:20:54 -0600
Subject: [petsc-users] Solving A*X = B where A and B are matrices
In-Reply-To: <CAE-ErqD-M0uvUdtk-Ook0zKFA0XgiL2+kO5+Js1u7=gn_nBb3w@mail.gmail.com>
References: <CAE-ErqCGjD-jdtxfKR__ixN26SQbT-jf2Yuq0horLNWZ_Dzgxw@mail.gmail.com>
	<55DD94DF-150F-4917-AA26-C0680107E065@mcs.anl.gov>
	<CAE-ErqD-M0uvUdtk-Ook0zKFA0XgiL2+kO5+Js1u7=gn_nBb3w@mail.gmail.com>
Message-ID: <CAMYG4GkZ6ooZhpnPekc=Yj+xRyTm2LuopkoE9s47M_DtN+1iyA@mail.gmail.com>

On Mon, Dec 3, 2012 at 1:08 PM, Jelena Slivka <slivkaje at gmail.com> wrote:
> Thank you very much!
> However, I have another question. I have a cluster of 4 nodes and each node
> has 6 cores. If I run my code using 6 cores on one node (using the command
> "mpiexec -n 6") it is much faster than running it on just one process (which
> is expected). However, if I try running the code on multiple nodes (using
> "mpiexec -f machinefile -ppn 4", where machinefile is the file which
> contains the node names), it runs much slower than on just one process. This
> also happens with tutorial examples. I have checked the number of iteration
> for KSP solver when spread on multiple processors and it doesn't seem to be
> the problem. Do you have any suggestions on what am I doing wrong? Are the
> commands I am using wrong?

Most operations are memory bandwidth limited, and it sounds like the memory
bandwidth for your cluster is maxed out by 1-2 procs.

   Matt

> On Sat, Dec 1, 2012 at 6:03 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>
>>
>>     We recommend following the directions
>> http://www.mcs.anl.gov/petsc/documentation/faq.html#schurcomplement  for
>> computing a Schur complement; just skip the unneeded step. MUMPS supports a
>> parallel Cholesky but you can also use a parallel LU with MUMPS, PaSTIX or
>> SuperLU_Dist and those will work fine also. With current software Cholesky
>> in parallel is not tons better than LU so generally not worth monkeying
>> with.
>>
>>    Barry
>>
>>
>> On Dec 1, 2012, at 12:05 PM, Jelena Slivka <slivkaje at gmail.com> wrote:
>>
>> > Hello!
>> > I am trying to solve A*X = B where A and B are matrices, and then find
>> > trace of the resulting matrix X. My approach has been to partition matrix B
>> > in column vectors bi and then solve each system A*xi = bi. Then, for all
>> > vectors xi I would extract i-th element xi(i) and sum those elements in
>> > order to get Trace(X).
>> > Pseudo-code:
>> > 1) load matrices A and B
>> > 2) transpose matrix B (so that each right-hand side bi is in the row, as
>> > operation MatGetColumnVector is slow)
>> > 3) set up KSPSolve
>> > 4) create vector diagonal (in which xi(i) elements will be stored)
>> > 5) for each row i of matrix B owned by current process:
>> >           - create vector bi by extracting row i from matrix B
>> >           - apply KSPsolve to get xi
>> >           - insert value xi(i) in diagonal vector (only the process
>> > which
>> >             holds the ith value of vector x(i) should do so)
>> > 6) sum vector diagonal to get the trace.
>> > However, my code (attached, along with the test case) runs fine on one
>> > process, but hangs if started on multiple processes. Could you please help
>> > me figure out what am I doing wrong?
>> > Also, could you please tell me is it possible to use Cholesky
>> > factorization when running on multiple processes (I see that I cannot use it
>> > when I set the format of matrix A to MPIAIJ)?
>> >
>> > <Experiment.c><Abin><Bbin>
>>
>



--
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener

From bsmith at mcs.anl.gov  Mon Dec  3 13:21:24 2012
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Mon, 3 Dec 2012 13:21:24 -0600
Subject: [petsc-users] Solving A*X = B where A and B are matrices
In-Reply-To: <CAE-ErqD-M0uvUdtk-Ook0zKFA0XgiL2+kO5+Js1u7=gn_nBb3w@mail.gmail.com>
References: <CAE-ErqCGjD-jdtxfKR__ixN26SQbT-jf2Yuq0horLNWZ_Dzgxw@mail.gmail.com>
	<55DD94DF-150F-4917-AA26-C0680107E065@mcs.anl.gov>
	<CAE-ErqD-M0uvUdtk-Ook0zKFA0XgiL2+kO5+Js1u7=gn_nBb3w@mail.gmail.com>
Message-ID: <8C7CCD12-F869-4FE8-9DEA-0BBA1283DAEC@mcs.anl.gov>


http://www.mcs.anl.gov/petsc/documentation/faq.html#computers

On Dec 3, 2012, at 1:08 PM, Jelena Slivka <slivkaje at gmail.com> wrote:

> Thank you very much! 
> However, I have another question. I have a cluster of 4 nodes and each node has 6 cores. If I run my code using 6 cores on one node (using the command "mpiexec -n 6") it is much faster than running it on just one process (which is expected). However, if I try running the code on multiple nodes (using "mpiexec -f machinefile -ppn 4", where machinefile is the file which contains the node names), it runs much slower than on just one process. This also happens with tutorial examples. I have checked the number of iteration for KSP solver when spread on multiple processors and it doesn't seem to be the problem. Do you have any suggestions on what am I doing wrong? Are the commands I am using wrong?
> 
> 
> On Sat, Dec 1, 2012 at 6:03 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> 
>     We recommend following the directions http://www.mcs.anl.gov/petsc/documentation/faq.html#schurcomplement  for computing a Schur complement; just skip the unneeded step. MUMPS supports a parallel Cholesky but you can also use a parallel LU with MUMPS, PaSTIX or SuperLU_Dist and those will work fine also. With current software Cholesky in parallel is not tons better than LU so generally not worth monkeying with.
> 
>    Barry
> 
> 
> On Dec 1, 2012, at 12:05 PM, Jelena Slivka <slivkaje at gmail.com> wrote:
> 
> > Hello!
> > I am trying to solve A*X = B where A and B are matrices, and then find trace of the resulting matrix X. My approach has been to partition matrix B in column vectors bi and then solve each system A*xi = bi. Then, for all vectors xi I would extract i-th element xi(i) and sum those elements in order to get Trace(X).
> > Pseudo-code:
> > 1) load matrices A and B
> > 2) transpose matrix B (so that each right-hand side bi is in the row, as operation MatGetColumnVector is slow)
> > 3) set up KSPSolve
> > 4) create vector diagonal (in which xi(i) elements will be stored)
> > 5) for each row i of matrix B owned by current process:
> >           - create vector bi by extracting row i from matrix B
> >           - apply KSPsolve to get xi
> >           - insert value xi(i) in diagonal vector (only the process which
> >             holds the ith value of vector x(i) should do so)
> > 6) sum vector diagonal to get the trace.
> > However, my code (attached, along with the test case) runs fine on one process, but hangs if started on multiple processes. Could you please help me figure out what am I doing wrong?
> > Also, could you please tell me is it possible to use Cholesky factorization when running on multiple processes (I see that I cannot use it when I set the format of matrix A to MPIAIJ)?
> >
> > <Experiment.c><Abin><Bbin>
> 
> 


From fd.kong at siat.ac.cn  Mon Dec  3 13:23:39 2012
From: fd.kong at siat.ac.cn (Fande Kong)
Date: Mon, 3 Dec 2012 12:23:39 -0700
Subject: [petsc-users] Can anyone guess the possible reason of the
 following errors?
In-Reply-To: <3416FED3-493A-42E6-83BB-EB661E69A90B@mcs.anl.gov>
References: <CAN5Wd-+DR986+mBLmDxtvY6L5Rzh2jgZUBe0i5R_edmoBpjazQ@mail.gmail.com>
	<CAMYG4GnSx5gR6-ZRjze7CWrhriYP5CJ_1MBXZRJuS1fZkhX7_Q@mail.gmail.com>
	<CAN5Wd-J8EiTekSt+i3CmvPnub+=xzfGUa6Gg6Byufd78aRTQpQ@mail.gmail.com>
	<3416FED3-493A-42E6-83BB-EB661E69A90B@mcs.anl.gov>
Message-ID: <CAN5Wd-JYUxpThPhYNpORWdektgT6iDh3=-uGAHsAYo+iWX1PQw@mail.gmail.com>

Are there any constraints for IS and Vec?

On Mon, Dec 3, 2012 at 12:19 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:

>
>   Perhaps some bad data is being passed into VecScatterCreate(). I would
> suggest having SpmcsSFCreateVecScatter
>  validate the IS's and Vecs being passed in. For example, do the IS have
> tons of duplicates, how long are they etc?
>
>    Barry
>
> On Dec 3, 2012, at 1:12 PM, Fande Kong <fd.kong at siat.ac.cn> wrote:
>
> > More details for the errors:
> >
> > [0]PETSC ERROR: PetscGatherMessageLengths() line 133 in
> src/sys/utils/mpimesg.c
> > [0]PETSC ERROR: VecScatterCreate_PtoP() line 2188 in
> src/vec/vec/utils/vpscat.c
> > [0]PETSC ERROR: VecScatterCreate() line 1431 in src/vec/vec/utils/vscat.c
> > [0]PETSC ERROR: SpmcsSFCreateVecScatter() line 96 in SpmcsSFComm.cpp
> > [0]PETSC ERROR: moveDataBetweenRootsAndLeaves() line 133 in
> SpmcsSFComm.cpp
> > [0]PETSC ERROR: SpmcsSFCreateNormalizedEmbeddedSF() line 359 in
> SpmcsSFComm.cpp
> > [0]PETSC ERROR: SpmcsSFDistributeSection() line 343 in SpmcsSection.cpp
> > [0]PETSC ERROR: SpmcsMeshDistribute() line 444 in distributeMesh.cpp
> > [0]PETSC ERROR: DMmeshInitialize() line 32 in mgInitialize.cpp
> > [0]PETSC ERROR: main() line 64 in linearElasticity3d.cpp
> > application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
> >
> >
> =====================================================================================
> > =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> > =   EXIT CODE: 256
> > =   CLEANING UP REMAINING PROCESSES
> > =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
> >
> =====================================================================================
> > [proxy:0:1 at node1778] HYD_pmcd_pmip_control_cmd_cb
> (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> > [proxy:0:1 at node1778] HYDT_dmxu_poll_wait_for_event
> (./tools/demux/demux_poll.c:77): callback returned error status
> > [proxy:0:1 at node1778] main (./pm/pmiserv/pmip.c:214): demux engine error
> waiting for event
> > [proxy:0:2 at node1777] HYD_pmcd_pmip_control_cmd_cb
> (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> > [proxy:0:2 at node1777] HYDT_dmxu_poll_wait_for_event
> (./tools/demux/demux_poll.c:77): callback returned error status
> > [proxy:0:2 at node1777] main (./pm/pmiserv/pmip.c:214): demux engine error
> waiting for event
> > [proxy:0:3 at node1773] HYD_pmcd_pmip_control_cmd_cb
> (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> > [proxy:0:3 at node1773] HYDT_dmxu_poll_wait_for_event
> (./tools/demux/demux_poll.c:77): callback returned error status
> > [proxy:0:3 at node1773] main (./pm/pmiserv/pmip.c:214): demux engine error
> waiting for event
> > [proxy:0:4 at node1770] HYD_pmcd_pmip_control_cmd_cb
> (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> > [proxy:0:4 at node1770] HYDT_dmxu_poll_wait_for_event
> (./tools/demux/demux_poll.c:77): callback returned error status
> > [proxy:0:4 at node1770] main (./pm/pmiserv/pmip.c:214): demux engine error
> waiting for event
> > [proxy:0:6 at node1760] HYD_pmcd_pmip_control_cmd_cb
> (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> > [proxy:0:6 at node1760] HYDT_dmxu_poll_wait_for_event
> (./tools/demux/demux_poll.c:77): callback returned error status
> > [proxy:0:6 at node1760] main (./pm/pmiserv/pmip.c:214): demux engine error
> waiting for event
> > [proxy:0:7 at node1758] HYD_pmcd_pmip_control_cmd_cb
> (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> > [proxy:0:7 at node1758] HYDT_dmxu_poll_wait_for_event
> (./tools/demux/demux_poll.c:77): callback returned error status
> > [proxy:0:7 at node1758] main (./pm/pmiserv/pmip.c:214): demux engine error
> waiting for event
> > [proxy:0:8 at node1738] HYD_pmcd_pmip_control_cmd_cb
> (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> > [proxy:0:8 at node1738] HYDT_dmxu_poll_wait_for_event
> (./tools/demux/demux_poll.c:77): callback returned error status
> > [proxy:0:8 at node1738] main (./pm/pmiserv/pmip.c:214): demux engine error
> waiting for event
> > [proxy:0:9 at node1736] HYD_pmcd_pmip_control_cmd_cb
> (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> > [proxy:0:9 at node1736] HYDT_dmxu_poll_wait_for_event
> (./tools/demux/demux_poll.c:77): callback returned error status
> > [proxy:0:9 at node1736] main (./pm/pmiserv/pmip.c:214): demux engine error
> waiting for event
> > [proxy:0:10 at node1668] HYD_pmcd_pmip_control_cmd_cb
> (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> > [proxy:0:10 at node1668] HYDT_dmxu_poll_wait_for_event
> (./tools/demux/demux_poll.c:77): callback returned error status
> > [proxy:0:10 at node1668] main (./pm/pmiserv/pmip.c:214): demux engine
> error waiting for event
> > [proxy:0:11 at node1667] HYD_pmcd_pmip_control_cmd_cb
> (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> > [proxy:0:11 at node1667] HYDT_dmxu_poll_wait_for_event
> (./tools/demux/demux_poll.c:77): callback returned error status
> > [proxy:0:11 at node1667] main (./pm/pmiserv/pmip.c:214): demux engine
> error waiting for event
> > [proxy:0:12 at node1658] HYD_pmcd_pmip_control_cmd_cb
> (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> > [proxy:0:12 at node1658] HYDT_dmxu_poll_wait_for_event
> (./tools/demux/demux_poll.c:77): callback returned error status
> > [proxy:0:12 at node1658] main (./pm/pmiserv/pmip.c:214): demux engine
> error waiting for event
> > [proxy:0:13 at node1656] HYD_pmcd_pmip_control_cmd_cb
> (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> > [proxy:0:13 at node1656] HYDT_dmxu_poll_wait_for_event
> (./tools/demux/demux_poll.c:77): callback returned error status
> > [proxy:0:13 at node1656] main (./pm/pmiserv/pmip.c:214): demux engine
> error waiting for event
> > [proxy:0:14 at node1637] HYD_pmcd_pmip_control_cmd_cb
> (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> > [proxy:0:14 at node1637] HYDT_dmxu_poll_wait_for_event
> (./tools/demux/demux_poll.c:77): callback returned error status
> > [proxy:0:14 at node1637] main (./pm/pmiserv/pmip.c:214): demux engine
> error waiting for event
> > [proxy:0:15 at node1636] HYD_pmcd_pmip_control_cmd_cb
> (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> > [proxy:0:15 at node1636] HYDT_dmxu_poll_wait_for_event
> (./tools/demux/demux_poll.c:77): callback returned error status
> > [proxy:0:15 at node1636] main (./pm/pmiserv/pmip.c:214): demux engine
> error waiting for event
> > [proxy:0:16 at node1611] HYD_pmcd_pmip_control_cmd_cb
> (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> > [proxy:0:16 at node1611] HYDT_dmxu_poll_wait_for_event
> (./tools/demux/demux_poll.c:77): callback returned error status
> > [proxy:0:16 at node1611] main (./pm/pmiserv/pmip.c:214): demux engine
> error waiting for event
> > [proxy:0:17 at node1380] HYD_pmcd_pmip_control_cmd_cb
> (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> > [proxy:0:17 at node1380] HYDT_dmxu_poll_wait_for_event
> (./tools/demux/demux_poll.c:77): callback returned error status
> > [proxy:0:17 at node1380] main (./pm/pmiserv/pmip.c:214): demux engine
> error waiting for event
> > [proxy:0:18 at node1379] HYD_pmcd_pmip_control_cmd_cb
> (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> > [proxy:0:18 at node1379] HYDT_dmxu_poll_wait_for_event
> (./tools/demux/demux_poll.c:77): callback returned error status
> > [proxy:0:18 at node1379] main (./pm/pmiserv/pmip.c:214): demux engine
> error waiting for event
> > [proxy:0:19 at node1378] HYD_pmcd_pmip_control_cmd_cb
> (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> > [proxy:0:19 at node1378] HYDT_dmxu_poll_wait_for_event
> (./tools/demux/demux_poll.c:77): callback returned error status
> > [proxy:0:19 at node1378] main (./pm/pmiserv/pmip.c:214): demux engine
> error waiting for event
> > [proxy:0:20 at node1377] HYD_pmcd_pmip_control_cmd_cb
> (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> > [proxy:0:20 at node1377] HYDT_dmxu_poll_wait_for_event
> (./tools/demux/demux_poll.c:77): callback returned error status
> > [proxy:0:20 at node1377] main (./pm/pmiserv/pmip.c:214): demux engine
> error waiting for event
> > [proxy:0:21 at node1376] HYD_pmcd_pmip_control_cmd_cb
> (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> > [proxy:0:21 at node1376] HYDT_dmxu_poll_wait_for_event
> (./tools/demux/demux_poll.c:77): callback returned error status
> > [proxy:0:21 at node1376] main (./pm/pmiserv/pmip.c:214): demux engine
> error waiting for event
> > [proxy:0:22 at node1375] HYD_pmcd_pmip_control_cmd_cb
> (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> > [proxy:0:22 at node1375] HYDT_dmxu_poll_wait_for_event
> (./tools/demux/demux_poll.c:77): callback returned error status
> > [proxy:0:22 at node1375] main (./pm/pmiserv/pmip.c:214): demux engine
> error waiting for event
> > [proxy:0:23 at node1374] HYD_pmcd_pmip_control_cmd_cb
> (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> > [proxy:0:23 at node1374] HYDT_dmxu_poll_wait_for_event
> (./tools/demux/demux_poll.c:77): callback returned error status
> > [proxy:0:23 at node1374] main (./pm/pmiserv/pmip.c:214): demux engine
> error waiting for event
> > [proxy:0:24 at node1373] HYD_pmcd_pmip_control_cmd_cb
> (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> > [proxy:0:24 at node1373] HYDT_dmxu_poll_wait_for_event
> (./tools/demux/demux_poll.c:77): callback returned error status
> > [proxy:0:24 at node1373] main (./pm/pmiserv/pmip.c:214): demux engine
> error waiting for event
> > [proxy:0:25 at node1372] HYD_pmcd_pmip_control_cmd_cb
> (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> > [proxy:0:25 at node1372] HYDT_dmxu_poll_wait_for_event
> (./tools/demux/demux_poll.c:77): callback returned error status
> > [proxy:0:25 at node1372] main (./pm/pmiserv/pmip.c:214): demux engine
> error waiting for event
> > [proxy:0:26 at node1371] HYD_pmcd_pmip_control_cmd_cb
> (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> > [proxy:0:26 at node1371] HYDT_dmxu_poll_wait_for_event
> (./tools/demux/demux_poll.c:77): callback returned error status
> > [proxy:0:26 at node1371] main (./pm/pmiserv/pmip.c:214): demux engine
> error waiting for event
> > [proxy:0:27 at node1370] HYD_pmcd_pmip_control_cmd_cb
> (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> > [proxy:0:27 at node1370] HYDT_dmxu_poll_wait_for_event
> (./tools/demux/demux_poll.c:77): callback returned error status
> > [proxy:0:27 at node1370] main (./pm/pmiserv/pmip.c:214): demux engine
> error waiting for event
> > [proxy:0:28 at node1369] HYD_pmcd_pmip_control_cmd_cb
> (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> > [proxy:0:28 at node1369] HYDT_dmxu_poll_wait_for_event
> (./tools/demux/demux_poll.c:77): callback returned error status
> > [proxy:0:28 at node1369] main (./pm/pmiserv/pmip.c:214): demux engine
> error waiting for event
> > [proxy:0:29 at node1368] HYD_pmcd_pmip_control_cmd_cb
> (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> > [proxy:0:29 at node1368] HYDT_dmxu_poll_wait_for_event
> (./tools/demux/demux_poll.c:77): callback returned error status
> > [proxy:0:29 at node1368] main (./pm/pmiserv/pmip.c:214): demux engine
> error waiting for event
> > [proxy:0:30 at node1367] HYD_pmcd_pmip_control_cmd_cb
> (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> > [proxy:0:30 at node1367] HYDT_dmxu_poll_wait_for_event
> (./tools/demux/demux_poll.c:77): callback returned error status
> > [proxy:0:30 at node1367] main (./pm/pmiserv/pmip.c:214): demux engine
> error waiting for event
> > [proxy:0:31 at node1366] HYD_pmcd_pmip_control_cmd_cb
> (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> > [proxy:0:31 at node1366] HYDT_dmxu_poll_wait_for_event
> (./tools/demux/demux_poll.c:77): callback returned error status
> > [proxy:0:31 at node1366] main (./pm/pmiserv/pmip.c:214): demux engine
> error waiting for event
> > [mpiexec at node1780] HYDT_bscu_wait_for_completion
> (./tools/bootstrap/utils/bscu_wait.c:70): one of the processes terminated
> badly; aborting
> > [mpiexec at node1780] HYDT_bsci_wait_for_completion
> (./tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for
> completion
> > [mpiexec at node1780] HYD_pmci_wait_for_completion
> (./pm/pmiserv/pmiserv_pmci.c:199): launcher returned error waiting for
> completion
> > [mpiexec at node1780] main (./ui/mpich/mpiexec.c:385): process manager
> error waiting for completion
> >
> > It seems nothing.
> >
> > On Mon, Dec 3, 2012 at 11:41 AM, Matthew Knepley <knepley at gmail.com>
> wrote:
> > On Mon, Dec 3, 2012 at 12:38 PM, Fande Kong <fande.kong at colorado.edu>
> wrote:
> > > Hi all,
> > >
> > > Can anyone guess the possible reason of the following errors:
> > >
> > >
> > > [0]PETSC ERROR: PetscGatherMessageLengths() line 133 in
> > > src/sys/utils/mpimesg.c
> > > [0]PETSC ERROR: VecScatterCreate_PtoP() line 2188 in
> > > src/vec/vec/utils/vpscat.c
> > > [0]PETSC ERROR: VecScatterCreate() line 1431 in
> src/vec/vec/utils/vscat.c
> >
> > Partial error messages are generally not helpful.
> >
> >    Matt
> >
> > > I have been working for several days to figure out the reason, but now
> I
> > > still get nothing.  I use Petsc-3.3-p3 based on the mvapich2-1.6. I
> tried to
> > > use vecscatter to distribute the mesh. When the mesh was small,
> everything
> > > was ok. But when the mesh became larger about 14,000,000 elements, I
> got the
> > > above errors.
> > >
> > > --
> > > Fande Kong
> > > Department of Computer Science
> > > University of Colorado at Boulder
> > >
> > >
> >
> >
> >
> > --
> > What most experimenters take for granted before they begin their
> > experiments is infinitely more interesting than any results to which
> > their experiments lead.
> > -- Norbert Wiener
> >
> >
> >
> >
> > --
> > Fande Kong
> > ShenZhen Institutes of Advanced Technology
> > Chinese Academy of Sciences
> >
>
>
>


-- 
Fande Kong
ShenZhen Institutes of Advanced Technology
Chinese Academy of Sciences
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121203/3d97ba4f/attachment-0001.html>

From knepley at gmail.com  Mon Dec  3 13:25:01 2012
From: knepley at gmail.com (Matthew Knepley)
Date: Mon, 3 Dec 2012 13:25:01 -0600
Subject: [petsc-users] Can anyone guess the possible reason of the
 following errors?
In-Reply-To: <CAN5Wd-JYUxpThPhYNpORWdektgT6iDh3=-uGAHsAYo+iWX1PQw@mail.gmail.com>
References: <CAN5Wd-+DR986+mBLmDxtvY6L5Rzh2jgZUBe0i5R_edmoBpjazQ@mail.gmail.com>
	<CAMYG4GnSx5gR6-ZRjze7CWrhriYP5CJ_1MBXZRJuS1fZkhX7_Q@mail.gmail.com>
	<CAN5Wd-J8EiTekSt+i3CmvPnub+=xzfGUa6Gg6Byufd78aRTQpQ@mail.gmail.com>
	<3416FED3-493A-42E6-83BB-EB661E69A90B@mcs.anl.gov>
	<CAN5Wd-JYUxpThPhYNpORWdektgT6iDh3=-uGAHsAYo+iWX1PQw@mail.gmail.com>
Message-ID: <CAMYG4GkGJSNU7_bdkGfHO2+Xu-xFQA+X2Pw1cPsuhecHjS4Dww@mail.gmail.com>

On Mon, Dec 3, 2012 at 1:23 PM, Fande Kong <fd.kong at siat.ac.cn> wrote:
> Are there any constraints for IS and Vec?

No, but this appears to be inconsistency.

  Matt

> On Mon, Dec 3, 2012 at 12:19 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>
>>
>>   Perhaps some bad data is being passed into VecScatterCreate(). I would
>> suggest having SpmcsSFCreateVecScatter
>>  validate the IS's and Vecs being passed in. For example, do the IS have
>> tons of duplicates, how long are they etc?
>>
>>    Barry
>>
>> On Dec 3, 2012, at 1:12 PM, Fande Kong <fd.kong at siat.ac.cn> wrote:
>>
>> > More details for the errors:
>> >
>> > [0]PETSC ERROR: PetscGatherMessageLengths() line 133 in
>> > src/sys/utils/mpimesg.c
>> > [0]PETSC ERROR: VecScatterCreate_PtoP() line 2188 in
>> > src/vec/vec/utils/vpscat.c
>> > [0]PETSC ERROR: VecScatterCreate() line 1431 in
>> > src/vec/vec/utils/vscat.c
>> > [0]PETSC ERROR: SpmcsSFCreateVecScatter() line 96 in SpmcsSFComm.cpp
>> > [0]PETSC ERROR: moveDataBetweenRootsAndLeaves() line 133 in
>> > SpmcsSFComm.cpp
>> > [0]PETSC ERROR: SpmcsSFCreateNormalizedEmbeddedSF() line 359 in
>> > SpmcsSFComm.cpp
>> > [0]PETSC ERROR: SpmcsSFDistributeSection() line 343 in SpmcsSection.cpp
>> > [0]PETSC ERROR: SpmcsMeshDistribute() line 444 in distributeMesh.cpp
>> > [0]PETSC ERROR: DMmeshInitialize() line 32 in mgInitialize.cpp
>> > [0]PETSC ERROR: main() line 64 in linearElasticity3d.cpp
>> > application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>> >
>> >
>> > =====================================================================================
>> > =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>> > =   EXIT CODE: 256
>> > =   CLEANING UP REMAINING PROCESSES
>> > =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>> >
>> > =====================================================================================
>> > [proxy:0:1 at node1778] HYD_pmcd_pmip_control_cmd_cb
>> > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
>> > [proxy:0:1 at node1778] HYDT_dmxu_poll_wait_for_event
>> > (./tools/demux/demux_poll.c:77): callback returned error status
>> > [proxy:0:1 at node1778] main (./pm/pmiserv/pmip.c:214): demux engine error
>> > waiting for event
>> > [proxy:0:2 at node1777] HYD_pmcd_pmip_control_cmd_cb
>> > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
>> > [proxy:0:2 at node1777] HYDT_dmxu_poll_wait_for_event
>> > (./tools/demux/demux_poll.c:77): callback returned error status
>> > [proxy:0:2 at node1777] main (./pm/pmiserv/pmip.c:214): demux engine error
>> > waiting for event
>> > [proxy:0:3 at node1773] HYD_pmcd_pmip_control_cmd_cb
>> > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
>> > [proxy:0:3 at node1773] HYDT_dmxu_poll_wait_for_event
>> > (./tools/demux/demux_poll.c:77): callback returned error status
>> > [proxy:0:3 at node1773] main (./pm/pmiserv/pmip.c:214): demux engine error
>> > waiting for event
>> > [proxy:0:4 at node1770] HYD_pmcd_pmip_control_cmd_cb
>> > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
>> > [proxy:0:4 at node1770] HYDT_dmxu_poll_wait_for_event
>> > (./tools/demux/demux_poll.c:77): callback returned error status
>> > [proxy:0:4 at node1770] main (./pm/pmiserv/pmip.c:214): demux engine error
>> > waiting for event
>> > [proxy:0:6 at node1760] HYD_pmcd_pmip_control_cmd_cb
>> > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
>> > [proxy:0:6 at node1760] HYDT_dmxu_poll_wait_for_event
>> > (./tools/demux/demux_poll.c:77): callback returned error status
>> > [proxy:0:6 at node1760] main (./pm/pmiserv/pmip.c:214): demux engine error
>> > waiting for event
>> > [proxy:0:7 at node1758] HYD_pmcd_pmip_control_cmd_cb
>> > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
>> > [proxy:0:7 at node1758] HYDT_dmxu_poll_wait_for_event
>> > (./tools/demux/demux_poll.c:77): callback returned error status
>> > [proxy:0:7 at node1758] main (./pm/pmiserv/pmip.c:214): demux engine error
>> > waiting for event
>> > [proxy:0:8 at node1738] HYD_pmcd_pmip_control_cmd_cb
>> > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
>> > [proxy:0:8 at node1738] HYDT_dmxu_poll_wait_for_event
>> > (./tools/demux/demux_poll.c:77): callback returned error status
>> > [proxy:0:8 at node1738] main (./pm/pmiserv/pmip.c:214): demux engine error
>> > waiting for event
>> > [proxy:0:9 at node1736] HYD_pmcd_pmip_control_cmd_cb
>> > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
>> > [proxy:0:9 at node1736] HYDT_dmxu_poll_wait_for_event
>> > (./tools/demux/demux_poll.c:77): callback returned error status
>> > [proxy:0:9 at node1736] main (./pm/pmiserv/pmip.c:214): demux engine error
>> > waiting for event
>> > [proxy:0:10 at node1668] HYD_pmcd_pmip_control_cmd_cb
>> > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
>> > [proxy:0:10 at node1668] HYDT_dmxu_poll_wait_for_event
>> > (./tools/demux/demux_poll.c:77): callback returned error status
>> > [proxy:0:10 at node1668] main (./pm/pmiserv/pmip.c:214): demux engine error
>> > waiting for event
>> > [proxy:0:11 at node1667] HYD_pmcd_pmip_control_cmd_cb
>> > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
>> > [proxy:0:11 at node1667] HYDT_dmxu_poll_wait_for_event
>> > (./tools/demux/demux_poll.c:77): callback returned error status
>> > [proxy:0:11 at node1667] main (./pm/pmiserv/pmip.c:214): demux engine error
>> > waiting for event
>> > [proxy:0:12 at node1658] HYD_pmcd_pmip_control_cmd_cb
>> > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
>> > [proxy:0:12 at node1658] HYDT_dmxu_poll_wait_for_event
>> > (./tools/demux/demux_poll.c:77): callback returned error status
>> > [proxy:0:12 at node1658] main (./pm/pmiserv/pmip.c:214): demux engine error
>> > waiting for event
>> > [proxy:0:13 at node1656] HYD_pmcd_pmip_control_cmd_cb
>> > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
>> > [proxy:0:13 at node1656] HYDT_dmxu_poll_wait_for_event
>> > (./tools/demux/demux_poll.c:77): callback returned error status
>> > [proxy:0:13 at node1656] main (./pm/pmiserv/pmip.c:214): demux engine error
>> > waiting for event
>> > [proxy:0:14 at node1637] HYD_pmcd_pmip_control_cmd_cb
>> > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
>> > [proxy:0:14 at node1637] HYDT_dmxu_poll_wait_for_event
>> > (./tools/demux/demux_poll.c:77): callback returned error status
>> > [proxy:0:14 at node1637] main (./pm/pmiserv/pmip.c:214): demux engine error
>> > waiting for event
>> > [proxy:0:15 at node1636] HYD_pmcd_pmip_control_cmd_cb
>> > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
>> > [proxy:0:15 at node1636] HYDT_dmxu_poll_wait_for_event
>> > (./tools/demux/demux_poll.c:77): callback returned error status
>> > [proxy:0:15 at node1636] main (./pm/pmiserv/pmip.c:214): demux engine error
>> > waiting for event
>> > [proxy:0:16 at node1611] HYD_pmcd_pmip_control_cmd_cb
>> > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
>> > [proxy:0:16 at node1611] HYDT_dmxu_poll_wait_for_event
>> > (./tools/demux/demux_poll.c:77): callback returned error status
>> > [proxy:0:16 at node1611] main (./pm/pmiserv/pmip.c:214): demux engine error
>> > waiting for event
>> > [proxy:0:17 at node1380] HYD_pmcd_pmip_control_cmd_cb
>> > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
>> > [proxy:0:17 at node1380] HYDT_dmxu_poll_wait_for_event
>> > (./tools/demux/demux_poll.c:77): callback returned error status
>> > [proxy:0:17 at node1380] main (./pm/pmiserv/pmip.c:214): demux engine error
>> > waiting for event
>> > [proxy:0:18 at node1379] HYD_pmcd_pmip_control_cmd_cb
>> > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
>> > [proxy:0:18 at node1379] HYDT_dmxu_poll_wait_for_event
>> > (./tools/demux/demux_poll.c:77): callback returned error status
>> > [proxy:0:18 at node1379] main (./pm/pmiserv/pmip.c:214): demux engine error
>> > waiting for event
>> > [proxy:0:19 at node1378] HYD_pmcd_pmip_control_cmd_cb
>> > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
>> > [proxy:0:19 at node1378] HYDT_dmxu_poll_wait_for_event
>> > (./tools/demux/demux_poll.c:77): callback returned error status
>> > [proxy:0:19 at node1378] main (./pm/pmiserv/pmip.c:214): demux engine error
>> > waiting for event
>> > [proxy:0:20 at node1377] HYD_pmcd_pmip_control_cmd_cb
>> > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
>> > [proxy:0:20 at node1377] HYDT_dmxu_poll_wait_for_event
>> > (./tools/demux/demux_poll.c:77): callback returned error status
>> > [proxy:0:20 at node1377] main (./pm/pmiserv/pmip.c:214): demux engine error
>> > waiting for event
>> > [proxy:0:21 at node1376] HYD_pmcd_pmip_control_cmd_cb
>> > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
>> > [proxy:0:21 at node1376] HYDT_dmxu_poll_wait_for_event
>> > (./tools/demux/demux_poll.c:77): callback returned error status
>> > [proxy:0:21 at node1376] main (./pm/pmiserv/pmip.c:214): demux engine error
>> > waiting for event
>> > [proxy:0:22 at node1375] HYD_pmcd_pmip_control_cmd_cb
>> > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
>> > [proxy:0:22 at node1375] HYDT_dmxu_poll_wait_for_event
>> > (./tools/demux/demux_poll.c:77): callback returned error status
>> > [proxy:0:22 at node1375] main (./pm/pmiserv/pmip.c:214): demux engine error
>> > waiting for event
>> > [proxy:0:23 at node1374] HYD_pmcd_pmip_control_cmd_cb
>> > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
>> > [proxy:0:23 at node1374] HYDT_dmxu_poll_wait_for_event
>> > (./tools/demux/demux_poll.c:77): callback returned error status
>> > [proxy:0:23 at node1374] main (./pm/pmiserv/pmip.c:214): demux engine error
>> > waiting for event
>> > [proxy:0:24 at node1373] HYD_pmcd_pmip_control_cmd_cb
>> > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
>> > [proxy:0:24 at node1373] HYDT_dmxu_poll_wait_for_event
>> > (./tools/demux/demux_poll.c:77): callback returned error status
>> > [proxy:0:24 at node1373] main (./pm/pmiserv/pmip.c:214): demux engine error
>> > waiting for event
>> > [proxy:0:25 at node1372] HYD_pmcd_pmip_control_cmd_cb
>> > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
>> > [proxy:0:25 at node1372] HYDT_dmxu_poll_wait_for_event
>> > (./tools/demux/demux_poll.c:77): callback returned error status
>> > [proxy:0:25 at node1372] main (./pm/pmiserv/pmip.c:214): demux engine error
>> > waiting for event
>> > [proxy:0:26 at node1371] HYD_pmcd_pmip_control_cmd_cb
>> > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
>> > [proxy:0:26 at node1371] HYDT_dmxu_poll_wait_for_event
>> > (./tools/demux/demux_poll.c:77): callback returned error status
>> > [proxy:0:26 at node1371] main (./pm/pmiserv/pmip.c:214): demux engine error
>> > waiting for event
>> > [proxy:0:27 at node1370] HYD_pmcd_pmip_control_cmd_cb
>> > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
>> > [proxy:0:27 at node1370] HYDT_dmxu_poll_wait_for_event
>> > (./tools/demux/demux_poll.c:77): callback returned error status
>> > [proxy:0:27 at node1370] main (./pm/pmiserv/pmip.c:214): demux engine error
>> > waiting for event
>> > [proxy:0:28 at node1369] HYD_pmcd_pmip_control_cmd_cb
>> > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
>> > [proxy:0:28 at node1369] HYDT_dmxu_poll_wait_for_event
>> > (./tools/demux/demux_poll.c:77): callback returned error status
>> > [proxy:0:28 at node1369] main (./pm/pmiserv/pmip.c:214): demux engine error
>> > waiting for event
>> > [proxy:0:29 at node1368] HYD_pmcd_pmip_control_cmd_cb
>> > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
>> > [proxy:0:29 at node1368] HYDT_dmxu_poll_wait_for_event
>> > (./tools/demux/demux_poll.c:77): callback returned error status
>> > [proxy:0:29 at node1368] main (./pm/pmiserv/pmip.c:214): demux engine error
>> > waiting for event
>> > [proxy:0:30 at node1367] HYD_pmcd_pmip_control_cmd_cb
>> > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
>> > [proxy:0:30 at node1367] HYDT_dmxu_poll_wait_for_event
>> > (./tools/demux/demux_poll.c:77): callback returned error status
>> > [proxy:0:30 at node1367] main (./pm/pmiserv/pmip.c:214): demux engine error
>> > waiting for event
>> > [proxy:0:31 at node1366] HYD_pmcd_pmip_control_cmd_cb
>> > (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
>> > [proxy:0:31 at node1366] HYDT_dmxu_poll_wait_for_event
>> > (./tools/demux/demux_poll.c:77): callback returned error status
>> > [proxy:0:31 at node1366] main (./pm/pmiserv/pmip.c:214): demux engine error
>> > waiting for event
>> > [mpiexec at node1780] HYDT_bscu_wait_for_completion
>> > (./tools/bootstrap/utils/bscu_wait.c:70): one of the processes terminated
>> > badly; aborting
>> > [mpiexec at node1780] HYDT_bsci_wait_for_completion
>> > (./tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for
>> > completion
>> > [mpiexec at node1780] HYD_pmci_wait_for_completion
>> > (./pm/pmiserv/pmiserv_pmci.c:199): launcher returned error waiting for
>> > completion
>> > [mpiexec at node1780] main (./ui/mpich/mpiexec.c:385): process manager
>> > error waiting for completion
>> >
>> > It seems nothing.
>> >
>> > On Mon, Dec 3, 2012 at 11:41 AM, Matthew Knepley <knepley at gmail.com>
>> > wrote:
>> > On Mon, Dec 3, 2012 at 12:38 PM, Fande Kong <fande.kong at colorado.edu>
>> > wrote:
>> > > Hi all,
>> > >
>> > > Can anyone guess the possible reason of the following errors:
>> > >
>> > >
>> > > [0]PETSC ERROR: PetscGatherMessageLengths() line 133 in
>> > > src/sys/utils/mpimesg.c
>> > > [0]PETSC ERROR: VecScatterCreate_PtoP() line 2188 in
>> > > src/vec/vec/utils/vpscat.c
>> > > [0]PETSC ERROR: VecScatterCreate() line 1431 in
>> > > src/vec/vec/utils/vscat.c
>> >
>> > Partial error messages are generally not helpful.
>> >
>> >    Matt
>> >
>> > > I have been working for several days to figure out the reason, but now
>> > > I
>> > > still get nothing.  I use Petsc-3.3-p3 based on the mvapich2-1.6. I
>> > > tried to
>> > > use vecscatter to distribute the mesh. When the mesh was small,
>> > > everything
>> > > was ok. But when the mesh became larger about 14,000,000 elements, I
>> > > got the
>> > > above errors.
>> > >
>> > > --
>> > > Fande Kong
>> > > Department of Computer Science
>> > > University of Colorado at Boulder
>> > >
>> > >
>> >
>> >
>> >
>> > --
>> > What most experimenters take for granted before they begin their
>> > experiments is infinitely more interesting than any results to which
>> > their experiments lead.
>> > -- Norbert Wiener
>> >
>> >
>> >
>> >
>> > --
>> > Fande Kong
>> > ShenZhen Institutes of Advanced Technology
>> > Chinese Academy of Sciences
>> >
>>
>>
>
>
>
> --
> Fande Kong
> ShenZhen Institutes of Advanced Technology
> Chinese Academy of Sciences
>



--
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener

From bsmith at mcs.anl.gov  Mon Dec  3 13:27:54 2012
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Mon, 3 Dec 2012 13:27:54 -0600
Subject: [petsc-users] Can anyone guess the possible reason of the
	following errors?
In-Reply-To: <CAN5Wd-JYUxpThPhYNpORWdektgT6iDh3=-uGAHsAYo+iWX1PQw@mail.gmail.com>
References: <CAN5Wd-+DR986+mBLmDxtvY6L5Rzh2jgZUBe0i5R_edmoBpjazQ@mail.gmail.com>
	<CAMYG4GnSx5gR6-ZRjze7CWrhriYP5CJ_1MBXZRJuS1fZkhX7_Q@mail.gmail.com>
	<CAN5Wd-J8EiTekSt+i3CmvPnub+=xzfGUa6Gg6Byufd78aRTQpQ@mail.gmail.com>
	<3416FED3-493A-42E6-83BB-EB661E69A90B@mcs.anl.gov>
	<CAN5Wd-JYUxpThPhYNpORWdektgT6iDh3=-uGAHsAYo+iWX1PQw@mail.gmail.com>
Message-ID: <95B914B5-3DCB-4BD1-BB87-3ED00221847B@mcs.anl.gov>


On Dec 3, 2012, at 1:23 PM, Fande Kong <fd.kong at siat.ac.cn> wrote:

> Are there any constraints for IS and Vec?

   You could also run with the option -mpi_return_on_error false 

and MPI may print an error message of what it thinks has gone wrong.

   Barry

> 
> On Mon, Dec 3, 2012 at 12:19 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> 
>   Perhaps some bad data is being passed into VecScatterCreate(). I would suggest having SpmcsSFCreateVecScatter
>  validate the IS's and Vecs being passed in. For example, do the IS have tons of duplicates, how long are they etc?
> 
>    Barry
> 
> On Dec 3, 2012, at 1:12 PM, Fande Kong <fd.kong at siat.ac.cn> wrote:
> 
> > More details for the errors:
> >
> > [0]PETSC ERROR: PetscGatherMessageLengths() line 133 in src/sys/utils/mpimesg.c
> > [0]PETSC ERROR: VecScatterCreate_PtoP() line 2188 in src/vec/vec/utils/vpscat.c
> > [0]PETSC ERROR: VecScatterCreate() line 1431 in src/vec/vec/utils/vscat.c
> > [0]PETSC ERROR: SpmcsSFCreateVecScatter() line 96 in SpmcsSFComm.cpp
> > [0]PETSC ERROR: moveDataBetweenRootsAndLeaves() line 133 in SpmcsSFComm.cpp
> > [0]PETSC ERROR: SpmcsSFCreateNormalizedEmbeddedSF() line 359 in SpmcsSFComm.cpp
> > [0]PETSC ERROR: SpmcsSFDistributeSection() line 343 in SpmcsSection.cpp
> > [0]PETSC ERROR: SpmcsMeshDistribute() line 444 in distributeMesh.cpp
> > [0]PETSC ERROR: DMmeshInitialize() line 32 in mgInitialize.cpp
> > [0]PETSC ERROR: main() line 64 in linearElasticity3d.cpp
> > application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
> >
> > =====================================================================================
> > =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> > =   EXIT CODE: 256
> > =   CLEANING UP REMAINING PROCESSES
> > =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
> > =====================================================================================
> > [proxy:0:1 at node1778] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> > [proxy:0:1 at node1778] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> > [proxy:0:1 at node1778] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event
> > [proxy:0:2 at node1777] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> > [proxy:0:2 at node1777] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> > [proxy:0:2 at node1777] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event
> > [proxy:0:3 at node1773] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> > [proxy:0:3 at node1773] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> > [proxy:0:3 at node1773] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event
> > [proxy:0:4 at node1770] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> > [proxy:0:4 at node1770] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> > [proxy:0:4 at node1770] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event
> > [proxy:0:6 at node1760] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> > [proxy:0:6 at node1760] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> > [proxy:0:6 at node1760] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event
> > [proxy:0:7 at node1758] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> > [proxy:0:7 at node1758] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> > [proxy:0:7 at node1758] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event
> > [proxy:0:8 at node1738] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> > [proxy:0:8 at node1738] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> > [proxy:0:8 at node1738] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event
> > [proxy:0:9 at node1736] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> > [proxy:0:9 at node1736] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> > [proxy:0:9 at node1736] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event
> > [proxy:0:10 at node1668] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> > [proxy:0:10 at node1668] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> > [proxy:0:10 at node1668] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event
> > [proxy:0:11 at node1667] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> > [proxy:0:11 at node1667] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> > [proxy:0:11 at node1667] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event
> > [proxy:0:12 at node1658] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> > [proxy:0:12 at node1658] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> > [proxy:0:12 at node1658] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event
> > [proxy:0:13 at node1656] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> > [proxy:0:13 at node1656] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> > [proxy:0:13 at node1656] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event
> > [proxy:0:14 at node1637] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> > [proxy:0:14 at node1637] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> > [proxy:0:14 at node1637] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event
> > [proxy:0:15 at node1636] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> > [proxy:0:15 at node1636] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> > [proxy:0:15 at node1636] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event
> > [proxy:0:16 at node1611] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> > [proxy:0:16 at node1611] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> > [proxy:0:16 at node1611] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event
> > [proxy:0:17 at node1380] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> > [proxy:0:17 at node1380] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> > [proxy:0:17 at node1380] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event
> > [proxy:0:18 at node1379] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> > [proxy:0:18 at node1379] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> > [proxy:0:18 at node1379] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event
> > [proxy:0:19 at node1378] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> > [proxy:0:19 at node1378] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> > [proxy:0:19 at node1378] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event
> > [proxy:0:20 at node1377] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> > [proxy:0:20 at node1377] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> > [proxy:0:20 at node1377] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event
> > [proxy:0:21 at node1376] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> > [proxy:0:21 at node1376] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> > [proxy:0:21 at node1376] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event
> > [proxy:0:22 at node1375] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> > [proxy:0:22 at node1375] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> > [proxy:0:22 at node1375] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event
> > [proxy:0:23 at node1374] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> > [proxy:0:23 at node1374] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> > [proxy:0:23 at node1374] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event
> > [proxy:0:24 at node1373] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> > [proxy:0:24 at node1373] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> > [proxy:0:24 at node1373] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event
> > [proxy:0:25 at node1372] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> > [proxy:0:25 at node1372] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> > [proxy:0:25 at node1372] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event
> > [proxy:0:26 at node1371] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> > [proxy:0:26 at node1371] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> > [proxy:0:26 at node1371] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event
> > [proxy:0:27 at node1370] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> > [proxy:0:27 at node1370] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> > [proxy:0:27 at node1370] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event
> > [proxy:0:28 at node1369] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> > [proxy:0:28 at node1369] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> > [proxy:0:28 at node1369] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event
> > [proxy:0:29 at node1368] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> > [proxy:0:29 at node1368] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> > [proxy:0:29 at node1368] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event
> > [proxy:0:30 at node1367] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> > [proxy:0:30 at node1367] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> > [proxy:0:30 at node1367] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event
> > [proxy:0:31 at node1366] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> > [proxy:0:31 at node1366] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> > [proxy:0:31 at node1366] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event
> > [mpiexec at node1780] HYDT_bscu_wait_for_completion (./tools/bootstrap/utils/bscu_wait.c:70): one of the processes terminated badly; aborting
> > [mpiexec at node1780] HYDT_bsci_wait_for_completion (./tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for completion
> > [mpiexec at node1780] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:199): launcher returned error waiting for completion
> > [mpiexec at node1780] main (./ui/mpich/mpiexec.c:385): process manager error waiting for completion
> >
> > It seems nothing.
> >
> > On Mon, Dec 3, 2012 at 11:41 AM, Matthew Knepley <knepley at gmail.com> wrote:
> > On Mon, Dec 3, 2012 at 12:38 PM, Fande Kong <fande.kong at colorado.edu> wrote:
> > > Hi all,
> > >
> > > Can anyone guess the possible reason of the following errors:
> > >
> > >
> > > [0]PETSC ERROR: PetscGatherMessageLengths() line 133 in
> > > src/sys/utils/mpimesg.c
> > > [0]PETSC ERROR: VecScatterCreate_PtoP() line 2188 in
> > > src/vec/vec/utils/vpscat.c
> > > [0]PETSC ERROR: VecScatterCreate() line 1431 in src/vec/vec/utils/vscat.c
> >
> > Partial error messages are generally not helpful.
> >
> >    Matt
> >
> > > I have been working for several days to figure out the reason, but now I
> > > still get nothing.  I use Petsc-3.3-p3 based on the mvapich2-1.6. I tried to
> > > use vecscatter to distribute the mesh. When the mesh was small, everything
> > > was ok. But when the mesh became larger about 14,000,000 elements, I got the
> > > above errors.
> > >
> > > --
> > > Fande Kong
> > > Department of Computer Science
> > > University of Colorado at Boulder
> > >
> > >
> >
> >
> >
> > --
> > What most experimenters take for granted before they begin their
> > experiments is infinitely more interesting than any results to which
> > their experiments lead.
> > -- Norbert Wiener
> >
> >
> >
> >
> > --
> > Fande Kong
> > ShenZhen Institutes of Advanced Technology
> > Chinese Academy of Sciences
> >
> 
> 
> 
> 
> 
> -- 
> Fande Kong
> ShenZhen Institutes of Advanced Technology
> Chinese Academy of Sciences
> 


From agrayver at gfz-potsdam.de  Tue Dec  4 05:07:33 2012
From: agrayver at gfz-potsdam.de (Alexander Grayver)
Date: Tue, 04 Dec 2012 12:07:33 +0100
Subject: [petsc-users] valgrind complains about string functions
In-Reply-To: <alpine.LFD.2.02.1212031049340.1761@asterix>
References: <50BC9D0A.2040803@gfz-potsdam.de>
	<alpine.LFD.2.02.1212031049340.1761@asterix>
Message-ID: <50BDD975.5010100@gfz-potsdam.de>

Jed, Satish,

suppression file is a nice option, thanks.

On 03.12.2012 17:53, Satish Balay wrote:
> On Mon, 3 Dec 2012, Alexander Grayver wrote:
>
>> Hello,
>>
>> I'm using PETSc-3.3-p4 compiled with ICC 12.0 + IntelMPI 4.0.3 and getting a
>> bunch of the errors related to the string functions:
>>
>> ==22020== Conditional jump or move depends on uninitialised value(s)
>> ==22020==    at 0x4D3109: __intel_sse2_strcpy (in /home/main)
>> ==22020==    by 0xE87D51D: PetscStrcpy (in
>> /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so)
>> ==22020==    by 0xE87B6A4: PetscStrallocpy (in
>> /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so)
>> ==22020==    by 0xE796769: PetscFListGetPathAndFunction (in
>> /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so)
>> ==22020==    by 0xE79652A: PetscFListAdd (in
>> /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so)
>> ==22020==    by 0xE64ACB8: MatMFFDRegister (in
>> /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so)
>> ==22020==    by 0xE64FA7D: MatMFFDRegisterAll (in
>> /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so)
>> ==22020==    by 0xE64F65B: MatMFFDInitializePackage (in
>> /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so)
>> ==22020==    by 0xE48D8C2: MatInitializePackage (in
>> /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so)
>> ==22020==    by 0xE5157DB: MatCreate (in
>> /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so)
>> ==22020==    by 0xE29A74C: MatCreateSeqAIJ (in
>> /home/lib/petsc-3.3-p4/intelmpi-intel12-double-release-c-shared/lib/libpetsc.so)
>>
>> Same thing for PetscStrncat etc.
>>
>> There was similar question two years ago in this mailing list and advice was
>> to use a different compiler. It is not an option for me.
> You can always use a separate build of PETSc with gcc,--download-mpich
> to get a valgrind clean build [for debugging purposes]
>
>> Thus, my question is can those errors potentially cause any serious troubles?
> Generally we can ignore issues valgrind finds in system/compiler
> libraries. [Jed has a valid explanation for this one].
>
> And generally valgrind provides 'default suppression files' for known
> glibc versions. But for such issues as with ifc, you can ask valgrind
> to create a supression file - and then rerun valgrind with this custom
> supression file - to get more readable output.
>
> Satish
>
>> I came across with time trying to debug a weird segmentation fault.
>>
>> Thanks.
>>
>>


-- 
Regards,
Alexander


From gokhalen at gmail.com  Thu Dec  6 12:33:25 2012
From: gokhalen at gmail.com (Nachiket Gokhale)
Date: Thu, 6 Dec 2012 13:33:25 -0500
Subject: [petsc-users] real and imaginary part of a number
Message-ID: <CAGBgCJFUgC33Lm4_aXkwA098_-XYt0y5DPZyWNDCPR8MKKha1g@mail.gmail.com>

Does petsc provide functions to get real and imaginary parts of a number? I
couldn't seem to find any functions in

http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/index.html

or in the vec collective either.

Cheers,

-Nachiket
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121206/3a33a8fe/attachment.html>

From jedbrown at mcs.anl.gov  Thu Dec  6 12:35:02 2012
From: jedbrown at mcs.anl.gov (Jed Brown)
Date: Thu, 6 Dec 2012 10:35:02 -0800
Subject: [petsc-users] real and imaginary part of a number
In-Reply-To: <CAGBgCJFUgC33Lm4_aXkwA098_-XYt0y5DPZyWNDCPR8MKKha1g@mail.gmail.com>
References: <CAGBgCJFUgC33Lm4_aXkwA098_-XYt0y5DPZyWNDCPR8MKKha1g@mail.gmail.com>
Message-ID: <CAM9tzS=N-QKpAoJpW05BUO3zQRUvNgLoRnBYwX-jQ43JDNxAUQ@mail.gmail.com>

PetscRealPart() and PetscImaginaryPart()

It looks like none of the math functions have man pages.


On Thu, Dec 6, 2012 at 10:33 AM, Nachiket Gokhale <gokhalen at gmail.com>wrote:

> Does petsc provide functions to get real and imaginary parts of a number?
> I couldn't seem to find any functions in
>
> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/index.html
>
> or in the vec collective either.
>
> Cheers,
>
> -Nachiket
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121206/0de72f6c/attachment.html>

From jedbrown at mcs.anl.gov  Thu Dec  6 12:46:42 2012
From: jedbrown at mcs.anl.gov (Jed Brown)
Date: Thu, 6 Dec 2012 10:46:42 -0800
Subject: [petsc-users] real and imaginary part of a number
In-Reply-To: <CAGBgCJG2+cjhedhc8p9agZfVApQUPtfoe+MDgMTwGZuD6aef8w@mail.gmail.com>
References: <CAGBgCJFUgC33Lm4_aXkwA098_-XYt0y5DPZyWNDCPR8MKKha1g@mail.gmail.com>
	<CAM9tzS=N-QKpAoJpW05BUO3zQRUvNgLoRnBYwX-jQ43JDNxAUQ@mail.gmail.com>
	<CAGBgCJG2+cjhedhc8p9agZfVApQUPtfoe+MDgMTwGZuD6aef8w@mail.gmail.com>
Message-ID: <CAM9tzSnhvPLGfZsmcDjHdTz3a1YyKkm3ht-sZ6iYeesmoaxDPQ@mail.gmail.com>

1. *Always* reply to the list, not me personally.

2. No, but there is VecConjugate().


On Thu, Dec 6, 2012 at 10:39 AM, Nachiket Gokhale <gokhalen at gmail.com>wrote:

> Thanks! And are there corresponding functions for a vector? VecRealPart
> and VecImagPart?
>
> -Nachiket
>
> On Thu, Dec 6, 2012 at 1:35 PM, Jed Brown <jedbrown at mcs.anl.gov> wrote:
>
>> PetscRealPart() and PetscImaginaryPart()
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121206/6132e237/attachment.html>

From thomas.witkowski at tu-dresden.de  Sat Dec  8 07:59:12 2012
From: thomas.witkowski at tu-dresden.de (Thomas Witkowski)
Date: Sat, 08 Dec 2012 14:59:12 +0100
Subject: [petsc-users] Creating explicit matrix scatter
Message-ID: <50C347B0.3020300@tu-dresden.de>

A have a distributed MATAIJ, which is non square. I want to create a new 
matrix, which has the same col layout but a different row layout and 
should be scattered from the original matrix. Thus, each rank should 
collect some rows, which may be non local in the original matrix, to its 
own local part of the new matrix. After creating the new matrix, I need 
not only to make some MatMult, but I need local access to the matrix 
rows. How to do this? Thanks for any advise.

Thomas

From jedbrown at mcs.anl.gov  Sat Dec  8 08:12:48 2012
From: jedbrown at mcs.anl.gov (Jed Brown)
Date: Sat, 8 Dec 2012 06:12:48 -0800
Subject: [petsc-users] Creating explicit matrix scatter
In-Reply-To: <50C347B0.3020300@tu-dresden.de>
References: <50C347B0.3020300@tu-dresden.de>
Message-ID: <CAM9tzSn2+8hXL9ZeJxJzqYaiyvVe0Hv5_4SZZzs=v-4_-kVzMw@mail.gmail.com>

MatGetSubMatrix() and later, MatGetRow()


On Sat, Dec 8, 2012 at 5:59 AM, Thomas Witkowski <
thomas.witkowski at tu-dresden.de> wrote:

> A have a distributed MATAIJ, which is non square. I want to create a new
> matrix, which has the same col layout but a different row layout and should
> be scattered from the original matrix. Thus, each rank should collect some
> rows, which may be non local in the original matrix, to its own local part
> of the new matrix. After creating the new matrix, I need not only to make
> some MatMult, but I need local access to the matrix rows. How to do this?
> Thanks for any advise.
>
> Thomas
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121208/5f7f7136/attachment.html>

From thomas.witkowski at tu-dresden.de  Sat Dec  8 08:33:57 2012
From: thomas.witkowski at tu-dresden.de (Thomas Witkowski)
Date: Sat, 08 Dec 2012 15:33:57 +0100
Subject: [petsc-users] Creating explicit matrix scatter
In-Reply-To: <CAM9tzSn2+8hXL9ZeJxJzqYaiyvVe0Hv5_4SZZzs=v-4_-kVzMw@mail.gmail.com>
References: <50C347B0.3020300@tu-dresden.de>
	<CAM9tzSn2+8hXL9ZeJxJzqYaiyvVe0Hv5_4SZZzs=v-4_-kVzMw@mail.gmail.com>
Message-ID: <50C34FD5.8050702@tu-dresden.de>

I checked the documentation of MatGetSubMatrix() and found the following:

"The rows in isrow will be sorted into the same order as the original 
matrix on each process."

For my case, this will be wrong. I need to say each task exactly which 
row from the old matrix should be which row in the new matrix. Any other 
possibility to do this?

Thomas

Am 08.12.2012 15:12, schrieb Jed Brown:
> MatGetSubMatrix() and later, MatGetRow()
>
>
> On Sat, Dec 8, 2012 at 5:59 AM, Thomas Witkowski 
> <thomas.witkowski at tu-dresden.de 
> <mailto:thomas.witkowski at tu-dresden.de>> wrote:
>
>     A have a distributed MATAIJ, which is non square. I want to create
>     a new matrix, which has the same col layout but a different row
>     layout and should be scattered from the original matrix. Thus,
>     each rank should collect some rows, which may be non local in the
>     original matrix, to its own local part of the new matrix. After
>     creating the new matrix, I need not only to make some MatMult, but
>     I need local access to the matrix rows. How to do this? Thanks for
>     any advise.
>
>     Thomas
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121208/f0c79cc6/attachment.html>

From jedbrown at mcs.anl.gov  Sat Dec  8 08:37:26 2012
From: jedbrown at mcs.anl.gov (Jed Brown)
Date: Sat, 8 Dec 2012 06:37:26 -0800
Subject: [petsc-users] Creating explicit matrix scatter
In-Reply-To: <50C34FD5.8050702@tu-dresden.de>
References: <50C347B0.3020300@tu-dresden.de>
	<CAM9tzSn2+8hXL9ZeJxJzqYaiyvVe0Hv5_4SZZzs=v-4_-kVzMw@mail.gmail.com>
	<50C34FD5.8050702@tu-dresden.de>
Message-ID: <CAM9tzS=-Z9w0h9pg5UUEmR35jM-kTyu2T8fNQKVwYxC7d3FuiQ@mail.gmail.com>

How about MatPermute() in a suitable place?

I don't know why you would need such a thing.

On Sat, Dec 8, 2012 at 6:33 AM, Thomas Witkowski <
thomas.witkowski at tu-dresden.de> wrote:

> I checked the documentation of MatGetSubMatrix() and found the following:
>
> "The rows in isrow will be sorted into the same order as the original
> matrix on each process."
>
> For my case, this will be wrong. I need to say each task exactly which row
> from the old matrix should be which row in the new matrix. Any other
> possibility to do this?
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121208/d241817f/attachment.html>

From thomas.witkowski at tu-dresden.de  Sat Dec  8 09:11:16 2012
From: thomas.witkowski at tu-dresden.de (Thomas Witkowski)
Date: Sat, 08 Dec 2012 16:11:16 +0100
Subject: [petsc-users] Creating explicit matrix scatter
In-Reply-To: <CAM9tzS=-Z9w0h9pg5UUEmR35jM-kTyu2T8fNQKVwYxC7d3FuiQ@mail.gmail.com>
References: <50C347B0.3020300@tu-dresden.de>
	<CAM9tzSn2+8hXL9ZeJxJzqYaiyvVe0Hv5_4SZZzs=v-4_-kVzMw@mail.gmail.com>
	<50C34FD5.8050702@tu-dresden.de>
	<CAM9tzS=-Z9w0h9pg5UUEmR35jM-kTyu2T8fNQKVwYxC7d3FuiQ@mail.gmail.com>
Message-ID: <50C35894.3050509@tu-dresden.de>

Am 08.12.2012 15:37, schrieb Jed Brown:
> How about MatPermute() in a suitable place?
Not really. I thing, I will try to solve the problem differently and 
avoid this matrix construction.

Thomas

>
> I don't know why you would need such a thing.
>
> On Sat, Dec 8, 2012 at 6:33 AM, Thomas Witkowski 
> <thomas.witkowski at tu-dresden.de 
> <mailto:thomas.witkowski at tu-dresden.de>> wrote:
>
>     I checked the documentation of MatGetSubMatrix() and found the
>     following:
>
>     "The rows in isrow will be sorted into the same order as the
>     original matrix on each process."
>
>     For my case, this will be wrong. I need to say each task exactly
>     which row from the old matrix should be which row in the new
>     matrix. Any other possibility to do this?
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121208/c28efa1d/attachment.html>

From jedbrown at mcs.anl.gov  Sat Dec  8 09:15:07 2012
From: jedbrown at mcs.anl.gov (Jed Brown)
Date: Sat, 8 Dec 2012 07:15:07 -0800
Subject: [petsc-users] Creating explicit matrix scatter
In-Reply-To: <50C35894.3050509@tu-dresden.de>
References: <50C347B0.3020300@tu-dresden.de>
	<CAM9tzSn2+8hXL9ZeJxJzqYaiyvVe0Hv5_4SZZzs=v-4_-kVzMw@mail.gmail.com>
	<50C34FD5.8050702@tu-dresden.de>
	<CAM9tzS=-Z9w0h9pg5UUEmR35jM-kTyu2T8fNQKVwYxC7d3FuiQ@mail.gmail.com>
	<50C35894.3050509@tu-dresden.de>
Message-ID: <CAM9tzSkgRghUte3c6W8iMDKSEiaWWoOQA00B-y0fAAxdV4DJHA@mail.gmail.com>

On Sat, Dec 8, 2012 at 7:11 AM, Thomas Witkowski <
thomas.witkowski at tu-dresden.de> wrote:

> Am 08.12.2012 15:37, schrieb Jed Brown:
>
> How about MatPermute() in a suitable place?
>
> Not really. I thing,


It should work...


> I will try to solve the problem differently and avoid this matrix
> construction.


... but I think this is better.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121208/a1ce49ad/attachment.html>

From daniel.arndt at stud.uni-goettingen.de  Tue Dec 11 08:19:14 2012
From: daniel.arndt at stud.uni-goettingen.de (Daniel Arndt)
Date: Tue, 11 Dec 2012 15:19:14 +0100
Subject: [petsc-users] early convergence failure
Message-ID: <50C740E2.4080105@stud.uni-goettingen.de>

Hello everyone,

at the moment I'm trying to solve a Poisson problem with SIPG
stabilization and discontinuous finite elements. The matrix is
constructed in deal.II. When I try to solve this problem with PETSc's CG
solver and a BlockJacobi preconditioner or a BoomerAMG preconditioner
from the Hypre package I get this weird error message.

Exception on processing:
Iterative method reported convergence failure in step 3 with residual
1.50616
Aborting!

Since the solver is allowed to take 5000 steps this convergence failure
is clearly early. Did anyone encounter such an error before? What can
produce such an early convergence failure?

Thanks in advance,
Daniel




From bsmith at mcs.anl.gov  Tue Dec 11 08:28:52 2012
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Tue, 11 Dec 2012 08:28:52 -0600
Subject: [petsc-users] early convergence failure
In-Reply-To: <50C740E2.4080105@stud.uni-goettingen.de>
References: <50C740E2.4080105@stud.uni-goettingen.de>
Message-ID: <DC020560-0CA9-4015-824A-A2CD9287D169@mcs.anl.gov>


   Daniel,

     That message is not coming from PETSc so likely Deal.II is processing the result from KSPConvergedReason() and generating that less then totally useful output. If you run with -ksp_converged_reason PETSC will (if Deal.II processes PETSc options correctly) print a more complete reason.

    Off hand I am guessing that CG detected a non-symmetric or indefinite matrix or preconditioner which it cannot handle so it barfed out. You can run with GMRES instead of CG and if that converges then this is the likely explanation.

    Barry


On Dec 11, 2012, at 8:19 AM, Daniel Arndt <daniel.arndt at stud.uni-goettingen.de> wrote:

> Hello everyone,
> 
> at the moment I'm trying to solve a Poisson problem with SIPG
> stabilization and discontinuous finite elements. The matrix is
> constructed in deal.II. When I try to solve this problem with PETSc's CG
> solver and a BlockJacobi preconditioner or a BoomerAMG preconditioner
> from the Hypre package I get this weird error message.
> 
> Exception on processing:
> Iterative method reported convergence failure in step 3 with residual
> 1.50616
> Aborting!
> 
> Since the solver is allowed to take 5000 steps this convergence failure
> is clearly early. Did anyone encounter such an error before? What can
> produce such an early convergence failure?
> 
> Thanks in advance,
> Daniel
> 
> 
> 


From daniel.arndt at stud.uni-goettingen.de  Tue Dec 11 10:00:41 2012
From: daniel.arndt at stud.uni-goettingen.de (Daniel Arndt)
Date: Tue, 11 Dec 2012 17:00:41 +0100
Subject: [petsc-users] early convergence failure
In-Reply-To: <DC020560-0CA9-4015-824A-A2CD9287D169@mcs.anl.gov>
References: <DC020560-0CA9-4015-824A-A2CD9287D169@mcs.anl.gov>
Message-ID: <50C758A9.4010408@stud.uni-goettingen.de>

Thank you Barry for your suggestions.

The error I get is now KSP_DIVERGED_INDEFINITE_PC. The matrix that I try
to invert is actually symmetric and positive definite. I was not aware
that this can lead to a indefinite preconditioner.
If I use a Jacobi preconditioner or tell BoomerAMG that the matrix is
symmetric I don't encounter any errors. So I'm quite for now :-)

Daniel
>   Daniel,
>
>      That message is not coming from PETSc so likely Deal.II is processing the result from KSPConvergedReason() and generating that less then totally useful output. If you run with -ksp_converged_reason PETSC will (if Deal.II processes PETSc options correctly) print a more complete reason.
>
>     Off hand I am guessing that CG detected a non-symmetric or indefinite matrix or preconditioner which it cannot handle so it barfed out. You can run with GMRES instead of CG and if that converges then this is the likely explanation.
>
>     Barry
>
>
> On Dec 11, 2012, at 8:19 AM, Daniel Arndt <daniel.arndt at stud.uni-goettingen.de <https://lists.mcs.anl.gov/mailman/listinfo/petsc-users>> wrote:
>
> >/ Hello everyone,
> />/ 
> />/ at the moment I'm trying to solve a Poisson problem with SIPG
> />/ stabilization and discontinuous finite elements. The matrix is
> />/ constructed in deal.II. When I try to solve this problem with PETSc's CG
> />/ solver and a BlockJacobi preconditioner or a BoomerAMG preconditioner
> />/ from the Hypre package I get this weird error message.
> />/ 
> />/ Exception on processing:
> />/ Iterative method reported convergence failure in step 3 with residual
> />/ 1.50616
> />/ Aborting!
> />/ 
> />/ Since the solver is allowed to take 5000 steps this convergence failure
> />/ is clearly early. Did anyone encounter such an error before? What can
> />/ produce such an early convergence failure?
> />/ 
> />/ Thanks in advance,
> />/ Daniel
> />/ 
> />/ 
> />/ 
> /
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121211/acf178bc/attachment.html>

From bsmith at mcs.anl.gov  Tue Dec 11 10:26:54 2012
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Tue, 11 Dec 2012 10:26:54 -0600
Subject: [petsc-users] early convergence failure
In-Reply-To: <50C758A9.4010408@stud.uni-goettingen.de>
References: <DC020560-0CA9-4015-824A-A2CD9287D169@mcs.anl.gov>
	<50C758A9.4010408@stud.uni-goettingen.de>
Message-ID: <7B5C8583-1A21-49C3-B8B3-FF708B84D4F8@mcs.anl.gov>


On Dec 11, 2012, at 10:00 AM, Daniel Arndt <daniel.arndt at stud.uni-goettingen.de> wrote:

> Thank you Barry for your suggestions.
> 
> The error I get is now KSP_DIVERGED_INDEFINITE_PC. The matrix that I try to invert is actually symmetric and positive definite. I was not aware that this can lead to a indefinite preconditioner.

   Absolutely. Many preconditioners do not retain this feature even in exact precision and with numerical effects it can even appear unexpected.

   By default BoomAMG doesn't retain this.

> If I use a Jacobi preconditioner or tell BoomerAMG that the matrix is symmetric I don't encounter any errors. So I'm quite for now :-)
> 
> Daniel
>>   Daniel,
>> 
>>      That message is not coming from PETSc so likely Deal.II is processing the result from KSPConvergedReason() and generating that less then totally useful output. If you run with -ksp_converged_reason PETSC will (if Deal.II processes PETSc options correctly) print a more complete reason.
>> 
>>     Off hand I am guessing that CG detected a non-symmetric or indefinite matrix or preconditioner which it cannot handle so it barfed out. You can run with GMRES instead of CG and if that converges then this is the likely explanation.
>> 
>>     Barry
>> 
>> 
>> On Dec 11, 2012, at 8:19 AM, Daniel Arndt <
>> daniel.arndt at stud.uni-goettingen.de
>> > wrote:
>> 
>> >
>>  Hello everyone,
>> 
>> >
>>  
>> 
>> >
>>  at the moment I'm trying to solve a Poisson problem with SIPG
>> 
>> >
>>  stabilization and discontinuous finite elements. The matrix is
>> 
>> >
>>  constructed in deal.II. When I try to solve this problem with PETSc's CG
>> 
>> >
>>  solver and a BlockJacobi preconditioner or a BoomerAMG preconditioner
>> 
>> >
>>  from the Hypre package I get this weird error message.
>> 
>> >
>>  
>> 
>> >
>>  Exception on processing:
>> 
>> >
>>  Iterative method reported convergence failure in step 3 with residual
>> 
>> >
>>  1.50616
>> 
>> >
>>  Aborting!
>> 
>> >
>>  
>> 
>> >
>>  Since the solver is allowed to take 5000 steps this convergence failure
>> 
>> >
>>  is clearly early. Did anyone encounter such an error before? What can
>> 
>> >
>>  produce such an early convergence failure?
>> 
>> >
>>  
>> 
>> >
>>  Thanks in advance,
>> 
>> >
>>  Daniel
>> 
>> >
>>  
>> 
>> >
>>  
>> 
>> >
>>  
>> 
>> 


From ling.zou at inl.gov  Tue Dec 11 15:34:01 2012
From: ling.zou at inl.gov (Zou (Non-US), Ling)
Date: Tue, 11 Dec 2012 14:34:01 -0700
Subject: [petsc-users] how to control snes_mf_operator
Message-ID: <CAJWxZDQLxsEa_5GAg6A5DsZhSvFSFRJaOzqQwQcqMtvU1vA3Nw@mail.gmail.com>

Dear All,

I have recently had an issue using snes_mf_operator. I've tried to figure
it out from PETSc manual and PETSc website but didn't get any luck, so I
submit my question here and hope some one could help me out.

(1)
=================================================================
A little bit background here: my problem has 7 variables, i.e.,

U = [U0, U1, U2, U3, U4, U5, U6]

U0 is in the order of 1.
U1, U2, U4 and U5 in the oder of 100.
U3 and U6 are in the order of 1.e8.

I believe this should be quite common for most PETSc users.

(2)
=================================================================
My problem here is, U0, by its physical meaning, has to be limited between
0 and 1. When PETSc starts to perturb the initial solution of U (which I
believe properly set) to approximate the operation of J (dU), the U0 get a
perturbation size in the order of 100, which causes problem as U0 has to be
smaller than 1.

>From my observation, this same perturbation size, say eps, is applied on
all U0, U1, U2, etc. <=== Is this the default setting?
I also guess that this eps, in the order of 100, is determined from my
initial solution vector and other related PETSc parameters.  <=== Is my
guessing right?

(3)
=================================================================
My question: I'd like to avoid a perturbation size ~100 on U0, i.e., I have
to limit it to be ~0.01 (or some small number) to avoid the U0 > 1
situation. Is there any way to control that?
Or, is there any advanced option to control the perturbation size on
different variables when using snes_mf_operator?



Hope my explanation is clear. Please let me know if it is not.


Best Regards,

Ling
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121211/8eb66997/attachment.html>

From knepley at gmail.com  Tue Dec 11 15:40:37 2012
From: knepley at gmail.com (Matthew Knepley)
Date: Tue, 11 Dec 2012 13:40:37 -0800
Subject: [petsc-users] how to control snes_mf_operator
In-Reply-To: <CAJWxZDQLxsEa_5GAg6A5DsZhSvFSFRJaOzqQwQcqMtvU1vA3Nw@mail.gmail.com>
References: <CAJWxZDQLxsEa_5GAg6A5DsZhSvFSFRJaOzqQwQcqMtvU1vA3Nw@mail.gmail.com>
Message-ID: <CAMYG4G=Y-Tc8PVRwq9XmtT8hMUOfffnsnFC9=gi7VeJdySWWhQ@mail.gmail.com>

On Tue, Dec 11, 2012 at 1:34 PM, Zou (Non-US), Ling <ling.zou at inl.gov> wrote:
> Dear All,
>
> I have recently had an issue using snes_mf_operator. I've tried to figure it
> out from PETSc manual and PETSc website but didn't get any luck, so I submit
> my question here and hope some one could help me out.
>
> (1)
> =================================================================
> A little bit background here: my problem has 7 variables, i.e.,
>
> U = [U0, U1, U2, U3, U4, U5, U6]
>
> U0 is in the order of 1.
> U1, U2, U4 and U5 in the oder of 100.
> U3 and U6 are in the order of 1.e8.
>
> I believe this should be quite common for most PETSc users.
>
> (2)
> =================================================================
> My problem here is, U0, by its physical meaning, has to be limited between 0
> and 1. When PETSc starts to perturb the initial solution of U (which I
> believe properly set) to approximate the operation of J (dU), the U0 get a
> perturbation size in the order of 100, which causes problem as U0 has to be
> smaller than 1.
>
> From my observation, this same perturbation size, say eps, is applied on all
> U0, U1, U2, etc. <=== Is this the default setting?
> I also guess that this eps, in the order of 100, is determined from my
> initial solution vector and other related PETSc parameters.  <=== Is my
> guessing right?
>
> (3)
> =================================================================
> My question: I'd like to avoid a perturbation size ~100 on U0, i.e., I have
> to limit it to be ~0.01 (or some small number) to avoid the U0 > 1
> situation. Is there any way to control that?
> Or, is there any advanced option to control the perturbation size on
> different variables when using snes_mf_operator?

Here is a description of the algorithm for calculating h. It seems to
me a better way to do this
is to non-dimensionalize first.

   Matt

>
> Hope my explanation is clear. Please let me know if it is not.
>
>
> Best Regards,
>
> Ling
>



--
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener

From knepley at gmail.com  Tue Dec 11 15:41:00 2012
From: knepley at gmail.com (Matthew Knepley)
Date: Tue, 11 Dec 2012 13:41:00 -0800
Subject: [petsc-users] how to control snes_mf_operator
In-Reply-To: <CAMYG4G=Y-Tc8PVRwq9XmtT8hMUOfffnsnFC9=gi7VeJdySWWhQ@mail.gmail.com>
References: <CAJWxZDQLxsEa_5GAg6A5DsZhSvFSFRJaOzqQwQcqMtvU1vA3Nw@mail.gmail.com>
	<CAMYG4G=Y-Tc8PVRwq9XmtT8hMUOfffnsnFC9=gi7VeJdySWWhQ@mail.gmail.com>
Message-ID: <CAMYG4G=9H9o=2d74EyT9Jr__qX5Z6TQCj95-df=1dn36cA=Sxw@mail.gmail.com>

On Tue, Dec 11, 2012 at 1:40 PM, Matthew Knepley <knepley at gmail.com> wrote:
> On Tue, Dec 11, 2012 at 1:34 PM, Zou (Non-US), Ling <ling.zou at inl.gov> wrote:
>> Dear All,
>>
>> I have recently had an issue using snes_mf_operator. I've tried to figure it
>> out from PETSc manual and PETSc website but didn't get any luck, so I submit
>> my question here and hope some one could help me out.
>>
>> (1)
>> =================================================================
>> A little bit background here: my problem has 7 variables, i.e.,
>>
>> U = [U0, U1, U2, U3, U4, U5, U6]
>>
>> U0 is in the order of 1.
>> U1, U2, U4 and U5 in the oder of 100.
>> U3 and U6 are in the order of 1.e8.
>>
>> I believe this should be quite common for most PETSc users.
>>
>> (2)
>> =================================================================
>> My problem here is, U0, by its physical meaning, has to be limited between 0
>> and 1. When PETSc starts to perturb the initial solution of U (which I
>> believe properly set) to approximate the operation of J (dU), the U0 get a
>> perturbation size in the order of 100, which causes problem as U0 has to be
>> smaller than 1.
>>
>> From my observation, this same perturbation size, say eps, is applied on all
>> U0, U1, U2, etc. <=== Is this the default setting?
>> I also guess that this eps, in the order of 100, is determined from my
>> initial solution vector and other related PETSc parameters.  <=== Is my
>> guessing right?
>>
>> (3)
>> =================================================================
>> My question: I'd like to avoid a perturbation size ~100 on U0, i.e., I have
>> to limit it to be ~0.01 (or some small number) to avoid the U0 > 1
>> situation. Is there any way to control that?
>> Or, is there any advanced option to control the perturbation size on
>> different variables when using snes_mf_operator?
>
> Here is a description of the algorithm for calculating h. It seems to
> me a better way to do this
> is to non-dimensionalize first.

I forgot the URL:
http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Mat/MatCreateMFFD.html#MatCreateMFFD

   Matt

>    Matt
>
>>
>> Hope my explanation is clear. Please let me know if it is not.
>>
>>
>> Best Regards,
>>
>> Ling
>>
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which
> their experiments lead.
> -- Norbert Wiener



--
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener

From ling.zou at inl.gov  Tue Dec 11 15:47:08 2012
From: ling.zou at inl.gov (Zou (Non-US), Ling)
Date: Tue, 11 Dec 2012 14:47:08 -0700
Subject: [petsc-users] how to control snes_mf_operator
In-Reply-To: <CAMYG4G=9H9o=2d74EyT9Jr__qX5Z6TQCj95-df=1dn36cA=Sxw@mail.gmail.com>
References: <CAJWxZDQLxsEa_5GAg6A5DsZhSvFSFRJaOzqQwQcqMtvU1vA3Nw@mail.gmail.com>
	<CAMYG4G=Y-Tc8PVRwq9XmtT8hMUOfffnsnFC9=gi7VeJdySWWhQ@mail.gmail.com>
	<CAMYG4G=9H9o=2d74EyT9Jr__qX5Z6TQCj95-df=1dn36cA=Sxw@mail.gmail.com>
Message-ID: <CAJWxZDQBQW44q8eNkpkzXZN9FqCazdGLkPuPDO5hT=+UKsKSEQ@mail.gmail.com>

thank you Matt. I will try to figure it out. Non-dimensionalization is
certainly something worth to try.

Best,

Ling

On Tue, Dec 11, 2012 at 2:41 PM, Matthew Knepley <knepley at gmail.com> wrote:

> On Tue, Dec 11, 2012 at 1:40 PM, Matthew Knepley <knepley at gmail.com>
> wrote:
> > On Tue, Dec 11, 2012 at 1:34 PM, Zou (Non-US), Ling <ling.zou at inl.gov>
> wrote:
> >> Dear All,
> >>
> >> I have recently had an issue using snes_mf_operator. I've tried to
> figure it
> >> out from PETSc manual and PETSc website but didn't get any luck, so I
> submit
> >> my question here and hope some one could help me out.
> >>
> >> (1)
> >> =================================================================
> >> A little bit background here: my problem has 7 variables, i.e.,
> >>
> >> U = [U0, U1, U2, U3, U4, U5, U6]
> >>
> >> U0 is in the order of 1.
> >> U1, U2, U4 and U5 in the oder of 100.
> >> U3 and U6 are in the order of 1.e8.
> >>
> >> I believe this should be quite common for most PETSc users.
> >>
> >> (2)
> >> =================================================================
> >> My problem here is, U0, by its physical meaning, has to be limited
> between 0
> >> and 1. When PETSc starts to perturb the initial solution of U (which I
> >> believe properly set) to approximate the operation of J (dU), the U0
> get a
> >> perturbation size in the order of 100, which causes problem as U0 has
> to be
> >> smaller than 1.
> >>
> >> From my observation, this same perturbation size, say eps, is applied
> on all
> >> U0, U1, U2, etc. <=== Is this the default setting?
> >> I also guess that this eps, in the order of 100, is determined from my
> >> initial solution vector and other related PETSc parameters.  <=== Is my
> >> guessing right?
> >>
> >> (3)
> >> =================================================================
> >> My question: I'd like to avoid a perturbation size ~100 on U0, i.e., I
> have
> >> to limit it to be ~0.01 (or some small number) to avoid the U0 > 1
> >> situation. Is there any way to control that?
> >> Or, is there any advanced option to control the perturbation size on
> >> different variables when using snes_mf_operator?
> >
> > Here is a description of the algorithm for calculating h. It seems to
> > me a better way to do this
> > is to non-dimensionalize first.
>
> I forgot the URL:
>
> http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Mat/MatCreateMFFD.html#MatCreateMFFD
>
>    Matt
>
> >    Matt
> >
> >>
> >> Hope my explanation is clear. Please let me know if it is not.
> >>
> >>
> >> Best Regards,
> >>
> >> Ling
> >>
> >
> >
> >
> > --
> > What most experimenters take for granted before they begin their
> > experiments is infinitely more interesting than any results to which
> > their experiments lead.
> > -- Norbert Wiener
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which
> their experiments lead.
> -- Norbert Wiener
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121211/6be71fa2/attachment.html>

From ling.zou at inl.gov  Tue Dec 11 16:19:31 2012
From: ling.zou at inl.gov (Zou (Non-US), Ling)
Date: Tue, 11 Dec 2012 15:19:31 -0700
Subject: [petsc-users] how to control snes_mf_operator
In-Reply-To: <CAJWxZDQBQW44q8eNkpkzXZN9FqCazdGLkPuPDO5hT=+UKsKSEQ@mail.gmail.com>
References: <CAJWxZDQLxsEa_5GAg6A5DsZhSvFSFRJaOzqQwQcqMtvU1vA3Nw@mail.gmail.com>
	<CAMYG4G=Y-Tc8PVRwq9XmtT8hMUOfffnsnFC9=gi7VeJdySWWhQ@mail.gmail.com>
	<CAMYG4G=9H9o=2d74EyT9Jr__qX5Z6TQCj95-df=1dn36cA=Sxw@mail.gmail.com>
	<CAJWxZDQBQW44q8eNkpkzXZN9FqCazdGLkPuPDO5hT=+UKsKSEQ@mail.gmail.com>
Message-ID: <CAJWxZDRmA4fFvfUc6Zenag2ZkgBMRMsQPuRDjtpc=PrjOgDHog@mail.gmail.com>

Matt, one more question.

Can I combine the options
-snes_type test
and
-mat_mffd_err 1.e-10
to see the effect?

Best,

Ling



On Tue, Dec 11, 2012 at 2:47 PM, Zou (Non-US), Ling <ling.zou at inl.gov>wrote:

> thank you Matt. I will try to figure it out. Non-dimensionalization is
> certainly something worth to try.
>
> Best,
>
> Ling
>
>
> On Tue, Dec 11, 2012 at 2:41 PM, Matthew Knepley <knepley at gmail.com>wrote:
>
>> On Tue, Dec 11, 2012 at 1:40 PM, Matthew Knepley <knepley at gmail.com>
>> wrote:
>> > On Tue, Dec 11, 2012 at 1:34 PM, Zou (Non-US), Ling <ling.zou at inl.gov>
>> wrote:
>> >> Dear All,
>> >>
>> >> I have recently had an issue using snes_mf_operator. I've tried to
>> figure it
>> >> out from PETSc manual and PETSc website but didn't get any luck, so I
>> submit
>> >> my question here and hope some one could help me out.
>> >>
>> >> (1)
>> >> =================================================================
>> >> A little bit background here: my problem has 7 variables, i.e.,
>> >>
>> >> U = [U0, U1, U2, U3, U4, U5, U6]
>> >>
>> >> U0 is in the order of 1.
>> >> U1, U2, U4 and U5 in the oder of 100.
>> >> U3 and U6 are in the order of 1.e8.
>> >>
>> >> I believe this should be quite common for most PETSc users.
>> >>
>> >> (2)
>> >> =================================================================
>> >> My problem here is, U0, by its physical meaning, has to be limited
>> between 0
>> >> and 1. When PETSc starts to perturb the initial solution of U (which I
>> >> believe properly set) to approximate the operation of J (dU), the U0
>> get a
>> >> perturbation size in the order of 100, which causes problem as U0 has
>> to be
>> >> smaller than 1.
>> >>
>> >> From my observation, this same perturbation size, say eps, is applied
>> on all
>> >> U0, U1, U2, etc. <=== Is this the default setting?
>> >> I also guess that this eps, in the order of 100, is determined from my
>> >> initial solution vector and other related PETSc parameters.  <=== Is my
>> >> guessing right?
>> >>
>> >> (3)
>> >> =================================================================
>> >> My question: I'd like to avoid a perturbation size ~100 on U0, i.e., I
>> have
>> >> to limit it to be ~0.01 (or some small number) to avoid the U0 > 1
>> >> situation. Is there any way to control that?
>> >> Or, is there any advanced option to control the perturbation size on
>> >> different variables when using snes_mf_operator?
>> >
>> > Here is a description of the algorithm for calculating h. It seems to
>> > me a better way to do this
>> > is to non-dimensionalize first.
>>
>> I forgot the URL:
>>
>> http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Mat/MatCreateMFFD.html#MatCreateMFFD
>>
>>    Matt
>>
>> >    Matt
>> >
>> >>
>> >> Hope my explanation is clear. Please let me know if it is not.
>> >>
>> >>
>> >> Best Regards,
>> >>
>> >> Ling
>> >>
>> >
>> >
>> >
>> > --
>> > What most experimenters take for granted before they begin their
>> > experiments is infinitely more interesting than any results to which
>> > their experiments lead.
>> > -- Norbert Wiener
>>
>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which
>> their experiments lead.
>> -- Norbert Wiener
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121211/3dec4559/attachment-0001.html>

From knepley at gmail.com  Tue Dec 11 16:29:33 2012
From: knepley at gmail.com (Matthew Knepley)
Date: Tue, 11 Dec 2012 14:29:33 -0800
Subject: [petsc-users] how to control snes_mf_operator
In-Reply-To: <CAJWxZDRmA4fFvfUc6Zenag2ZkgBMRMsQPuRDjtpc=PrjOgDHog@mail.gmail.com>
References: <CAJWxZDQLxsEa_5GAg6A5DsZhSvFSFRJaOzqQwQcqMtvU1vA3Nw@mail.gmail.com>
	<CAMYG4G=Y-Tc8PVRwq9XmtT8hMUOfffnsnFC9=gi7VeJdySWWhQ@mail.gmail.com>
	<CAMYG4G=9H9o=2d74EyT9Jr__qX5Z6TQCj95-df=1dn36cA=Sxw@mail.gmail.com>
	<CAJWxZDQBQW44q8eNkpkzXZN9FqCazdGLkPuPDO5hT=+UKsKSEQ@mail.gmail.com>
	<CAJWxZDRmA4fFvfUc6Zenag2ZkgBMRMsQPuRDjtpc=PrjOgDHog@mail.gmail.com>
Message-ID: <CAMYG4Gmz5PCy-XTbAqKOgcc1cLihGV2MH7Bhe6nqcKE8bB8=OA@mail.gmail.com>

On Tue, Dec 11, 2012 at 2:19 PM, Zou (Non-US), Ling <ling.zou at inl.gov> wrote:
> Matt, one more question.
>
> Can I combine the options
> -snes_type test
> and
> -mat_mffd_err 1.e-10
> to see the effect?

I do not understand your question. test does compare the analytic and
FD Jacobian
actions, but I thought you did not have an analytic action.

   Matt

> Best,
>
> Ling
>
>
>
> On Tue, Dec 11, 2012 at 2:47 PM, Zou (Non-US), Ling <ling.zou at inl.gov>
> wrote:
>>
>> thank you Matt. I will try to figure it out. Non-dimensionalization is
>> certainly something worth to try.
>>
>> Best,
>>
>> Ling
>>
>>
>> On Tue, Dec 11, 2012 at 2:41 PM, Matthew Knepley <knepley at gmail.com>
>> wrote:
>>>
>>> On Tue, Dec 11, 2012 at 1:40 PM, Matthew Knepley <knepley at gmail.com>
>>> wrote:
>>> > On Tue, Dec 11, 2012 at 1:34 PM, Zou (Non-US), Ling <ling.zou at inl.gov>
>>> > wrote:
>>> >> Dear All,
>>> >>
>>> >> I have recently had an issue using snes_mf_operator. I've tried to
>>> >> figure it
>>> >> out from PETSc manual and PETSc website but didn't get any luck, so I
>>> >> submit
>>> >> my question here and hope some one could help me out.
>>> >>
>>> >> (1)
>>> >> =================================================================
>>> >> A little bit background here: my problem has 7 variables, i.e.,
>>> >>
>>> >> U = [U0, U1, U2, U3, U4, U5, U6]
>>> >>
>>> >> U0 is in the order of 1.
>>> >> U1, U2, U4 and U5 in the oder of 100.
>>> >> U3 and U6 are in the order of 1.e8.
>>> >>
>>> >> I believe this should be quite common for most PETSc users.
>>> >>
>>> >> (2)
>>> >> =================================================================
>>> >> My problem here is, U0, by its physical meaning, has to be limited
>>> >> between 0
>>> >> and 1. When PETSc starts to perturb the initial solution of U (which I
>>> >> believe properly set) to approximate the operation of J (dU), the U0
>>> >> get a
>>> >> perturbation size in the order of 100, which causes problem as U0 has
>>> >> to be
>>> >> smaller than 1.
>>> >>
>>> >> From my observation, this same perturbation size, say eps, is applied
>>> >> on all
>>> >> U0, U1, U2, etc. <=== Is this the default setting?
>>> >> I also guess that this eps, in the order of 100, is determined from my
>>> >> initial solution vector and other related PETSc parameters.  <=== Is
>>> >> my
>>> >> guessing right?
>>> >>
>>> >> (3)
>>> >> =================================================================
>>> >> My question: I'd like to avoid a perturbation size ~100 on U0, i.e., I
>>> >> have
>>> >> to limit it to be ~0.01 (or some small number) to avoid the U0 > 1
>>> >> situation. Is there any way to control that?
>>> >> Or, is there any advanced option to control the perturbation size on
>>> >> different variables when using snes_mf_operator?
>>> >
>>> > Here is a description of the algorithm for calculating h. It seems to
>>> > me a better way to do this
>>> > is to non-dimensionalize first.
>>>
>>> I forgot the URL:
>>>
>>> http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Mat/MatCreateMFFD.html#MatCreateMFFD
>>>
>>>    Matt
>>>
>>> >    Matt
>>> >
>>> >>
>>> >> Hope my explanation is clear. Please let me know if it is not.
>>> >>
>>> >>
>>> >> Best Regards,
>>> >>
>>> >> Ling
>>> >>
>>> >
>>> >
>>> >
>>> > --
>>> > What most experimenters take for granted before they begin their
>>> > experiments is infinitely more interesting than any results to which
>>> > their experiments lead.
>>> > -- Norbert Wiener
>>>
>>>
>>>
>>> --
>>> What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which
>>> their experiments lead.
>>> -- Norbert Wiener
>>
>>
>



--
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener

From ling.zou at inl.gov  Tue Dec 11 16:40:42 2012
From: ling.zou at inl.gov (Zou (Non-US), Ling)
Date: Tue, 11 Dec 2012 15:40:42 -0700
Subject: [petsc-users] how to control snes_mf_operator
In-Reply-To: <CAMYG4Gmz5PCy-XTbAqKOgcc1cLihGV2MH7Bhe6nqcKE8bB8=OA@mail.gmail.com>
References: <CAJWxZDQLxsEa_5GAg6A5DsZhSvFSFRJaOzqQwQcqMtvU1vA3Nw@mail.gmail.com>
	<CAMYG4G=Y-Tc8PVRwq9XmtT8hMUOfffnsnFC9=gi7VeJdySWWhQ@mail.gmail.com>
	<CAMYG4G=9H9o=2d74EyT9Jr__qX5Z6TQCj95-df=1dn36cA=Sxw@mail.gmail.com>
	<CAJWxZDQBQW44q8eNkpkzXZN9FqCazdGLkPuPDO5hT=+UKsKSEQ@mail.gmail.com>
	<CAJWxZDRmA4fFvfUc6Zenag2ZkgBMRMsQPuRDjtpc=PrjOgDHog@mail.gmail.com>
	<CAMYG4Gmz5PCy-XTbAqKOgcc1cLihGV2MH7Bhe6nqcKE8bB8=OA@mail.gmail.com>
Message-ID: <CAJWxZDQDo5WovbxYcBFsjLZLNDnFb6FoB7psRxxoq=qDNmANEg@mail.gmail.com>

Hmm... I have an 'approximated' analytical Jacobian to compare. And I did
this:

./my-moose-project -i input.i -snes_type test -snes_test_display > out

I actually found out that the PETSc provided FD Jacobian gives 'nan'
numbers, while my approximated Jacobian does not give 'nan' at the same
positions.

As we discussed in the previous emails, the perturbation on U0 is too
large, which makes 'nan' appear in the FD Jacobians. So....I am trying to
use a smaller '-mat_mffd_err <number here>', to see if I could get an easy
fix by now, like this,

./my-moose-project -i input.i -snes_type test -md_mffd_err 1.e-10
-snes_test_display > out

seems not working :-(
no matter what number I give to -md_mffd_err, the print out results seem
not changed.

But of course, non-dimensionalization might be the ultimate solution.

Ling

On Tue, Dec 11, 2012 at 3:29 PM, Matthew Knepley <knepley at gmail.com> wrote:

> On Tue, Dec 11, 2012 at 2:19 PM, Zou (Non-US), Ling <ling.zou at inl.gov>
> wrote:
> > Matt, one more question.
> >
> > Can I combine the options
> > -snes_type test
> > and
> > -mat_mffd_err 1.e-10
> > to see the effect?
>
> I do not understand your question. test does compare the analytic and
> FD Jacobian
> actions, but I thought you did not have an analytic action.
>
>    Matt
>
> > Best,
> >
> > Ling
> >
> >
> >
> > On Tue, Dec 11, 2012 at 2:47 PM, Zou (Non-US), Ling <ling.zou at inl.gov>
> > wrote:
> >>
> >> thank you Matt. I will try to figure it out. Non-dimensionalization is
> >> certainly something worth to try.
> >>
> >> Best,
> >>
> >> Ling
> >>
> >>
> >> On Tue, Dec 11, 2012 at 2:41 PM, Matthew Knepley <knepley at gmail.com>
> >> wrote:
> >>>
> >>> On Tue, Dec 11, 2012 at 1:40 PM, Matthew Knepley <knepley at gmail.com>
> >>> wrote:
> >>> > On Tue, Dec 11, 2012 at 1:34 PM, Zou (Non-US), Ling <
> ling.zou at inl.gov>
> >>> > wrote:
> >>> >> Dear All,
> >>> >>
> >>> >> I have recently had an issue using snes_mf_operator. I've tried to
> >>> >> figure it
> >>> >> out from PETSc manual and PETSc website but didn't get any luck, so
> I
> >>> >> submit
> >>> >> my question here and hope some one could help me out.
> >>> >>
> >>> >> (1)
> >>> >> =================================================================
> >>> >> A little bit background here: my problem has 7 variables, i.e.,
> >>> >>
> >>> >> U = [U0, U1, U2, U3, U4, U5, U6]
> >>> >>
> >>> >> U0 is in the order of 1.
> >>> >> U1, U2, U4 and U5 in the oder of 100.
> >>> >> U3 and U6 are in the order of 1.e8.
> >>> >>
> >>> >> I believe this should be quite common for most PETSc users.
> >>> >>
> >>> >> (2)
> >>> >> =================================================================
> >>> >> My problem here is, U0, by its physical meaning, has to be limited
> >>> >> between 0
> >>> >> and 1. When PETSc starts to perturb the initial solution of U
> (which I
> >>> >> believe properly set) to approximate the operation of J (dU), the U0
> >>> >> get a
> >>> >> perturbation size in the order of 100, which causes problem as U0
> has
> >>> >> to be
> >>> >> smaller than 1.
> >>> >>
> >>> >> From my observation, this same perturbation size, say eps, is
> applied
> >>> >> on all
> >>> >> U0, U1, U2, etc. <=== Is this the default setting?
> >>> >> I also guess that this eps, in the order of 100, is determined from
> my
> >>> >> initial solution vector and other related PETSc parameters.  <=== Is
> >>> >> my
> >>> >> guessing right?
> >>> >>
> >>> >> (3)
> >>> >> =================================================================
> >>> >> My question: I'd like to avoid a perturbation size ~100 on U0,
> i.e., I
> >>> >> have
> >>> >> to limit it to be ~0.01 (or some small number) to avoid the U0 > 1
> >>> >> situation. Is there any way to control that?
> >>> >> Or, is there any advanced option to control the perturbation size on
> >>> >> different variables when using snes_mf_operator?
> >>> >
> >>> > Here is a description of the algorithm for calculating h. It seems to
> >>> > me a better way to do this
> >>> > is to non-dimensionalize first.
> >>>
> >>> I forgot the URL:
> >>>
> >>>
> http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Mat/MatCreateMFFD.html#MatCreateMFFD
> >>>
> >>>    Matt
> >>>
> >>> >    Matt
> >>> >
> >>> >>
> >>> >> Hope my explanation is clear. Please let me know if it is not.
> >>> >>
> >>> >>
> >>> >> Best Regards,
> >>> >>
> >>> >> Ling
> >>> >>
> >>> >
> >>> >
> >>> >
> >>> > --
> >>> > What most experimenters take for granted before they begin their
> >>> > experiments is infinitely more interesting than any results to which
> >>> > their experiments lead.
> >>> > -- Norbert Wiener
> >>>
> >>>
> >>>
> >>> --
> >>> What most experimenters take for granted before they begin their
> >>> experiments is infinitely more interesting than any results to which
> >>> their experiments lead.
> >>> -- Norbert Wiener
> >>
> >>
> >
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which
> their experiments lead.
> -- Norbert Wiener
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121211/f470da9b/attachment.html>

From knepley at gmail.com  Tue Dec 11 16:50:52 2012
From: knepley at gmail.com (Matthew Knepley)
Date: Tue, 11 Dec 2012 14:50:52 -0800
Subject: [petsc-users] how to control snes_mf_operator
In-Reply-To: <CAJWxZDQDo5WovbxYcBFsjLZLNDnFb6FoB7psRxxoq=qDNmANEg@mail.gmail.com>
References: <CAJWxZDQLxsEa_5GAg6A5DsZhSvFSFRJaOzqQwQcqMtvU1vA3Nw@mail.gmail.com>
	<CAMYG4G=Y-Tc8PVRwq9XmtT8hMUOfffnsnFC9=gi7VeJdySWWhQ@mail.gmail.com>
	<CAMYG4G=9H9o=2d74EyT9Jr__qX5Z6TQCj95-df=1dn36cA=Sxw@mail.gmail.com>
	<CAJWxZDQBQW44q8eNkpkzXZN9FqCazdGLkPuPDO5hT=+UKsKSEQ@mail.gmail.com>
	<CAJWxZDRmA4fFvfUc6Zenag2ZkgBMRMsQPuRDjtpc=PrjOgDHog@mail.gmail.com>
	<CAMYG4Gmz5PCy-XTbAqKOgcc1cLihGV2MH7Bhe6nqcKE8bB8=OA@mail.gmail.com>
	<CAJWxZDQDo5WovbxYcBFsjLZLNDnFb6FoB7psRxxoq=qDNmANEg@mail.gmail.com>
Message-ID: <CAMYG4G=F3vkRouueeMc+Gbx78spFm8sDASmMat7AyGq-t+VK9Q@mail.gmail.com>

On Tue, Dec 11, 2012 at 2:40 PM, Zou (Non-US), Ling <ling.zou at inl.gov> wrote:
> Hmm... I have an 'approximated' analytical Jacobian to compare. And I did
> this:
>
> ./my-moose-project -i input.i -snes_type test -snes_test_display > out
>
> I actually found out that the PETSc provided FD Jacobian gives 'nan'
> numbers, while my approximated Jacobian does not give 'nan' at the same
> positions.
>
> As we discussed in the previous emails, the perturbation on U0 is too large,
> which makes 'nan' appear in the FD Jacobians. So....I am trying to use a
> smaller '-mat_mffd_err <number here>', to see if I could get an easy fix by
> now, like this,

I don't think 'err' has anything to do with it. If you read the page I
mailed you, I
believe umin can be made very small.

   Matt

> ./my-moose-project -i input.i -snes_type test -md_mffd_err 1.e-10
> -snes_test_display > out
>
> seems not working :-(
> no matter what number I give to -md_mffd_err, the print out results seem not
> changed.
>
> But of course, non-dimensionalization might be the ultimate solution.
>
> Ling
>
> On Tue, Dec 11, 2012 at 3:29 PM, Matthew Knepley <knepley at gmail.com> wrote:
>>
>> On Tue, Dec 11, 2012 at 2:19 PM, Zou (Non-US), Ling <ling.zou at inl.gov>
>> wrote:
>> > Matt, one more question.
>> >
>> > Can I combine the options
>> > -snes_type test
>> > and
>> > -mat_mffd_err 1.e-10
>> > to see the effect?
>>
>> I do not understand your question. test does compare the analytic and
>> FD Jacobian
>> actions, but I thought you did not have an analytic action.
>>
>>    Matt
>>
>> > Best,
>> >
>> > Ling
>> >
>> >
>> >
>> > On Tue, Dec 11, 2012 at 2:47 PM, Zou (Non-US), Ling <ling.zou at inl.gov>
>> > wrote:
>> >>
>> >> thank you Matt. I will try to figure it out. Non-dimensionalization is
>> >> certainly something worth to try.
>> >>
>> >> Best,
>> >>
>> >> Ling
>> >>
>> >>
>> >> On Tue, Dec 11, 2012 at 2:41 PM, Matthew Knepley <knepley at gmail.com>
>> >> wrote:
>> >>>
>> >>> On Tue, Dec 11, 2012 at 1:40 PM, Matthew Knepley <knepley at gmail.com>
>> >>> wrote:
>> >>> > On Tue, Dec 11, 2012 at 1:34 PM, Zou (Non-US), Ling
>> >>> > <ling.zou at inl.gov>
>> >>> > wrote:
>> >>> >> Dear All,
>> >>> >>
>> >>> >> I have recently had an issue using snes_mf_operator. I've tried to
>> >>> >> figure it
>> >>> >> out from PETSc manual and PETSc website but didn't get any luck, so
>> >>> >> I
>> >>> >> submit
>> >>> >> my question here and hope some one could help me out.
>> >>> >>
>> >>> >> (1)
>> >>> >> =================================================================
>> >>> >> A little bit background here: my problem has 7 variables, i.e.,
>> >>> >>
>> >>> >> U = [U0, U1, U2, U3, U4, U5, U6]
>> >>> >>
>> >>> >> U0 is in the order of 1.
>> >>> >> U1, U2, U4 and U5 in the oder of 100.
>> >>> >> U3 and U6 are in the order of 1.e8.
>> >>> >>
>> >>> >> I believe this should be quite common for most PETSc users.
>> >>> >>
>> >>> >> (2)
>> >>> >> =================================================================
>> >>> >> My problem here is, U0, by its physical meaning, has to be limited
>> >>> >> between 0
>> >>> >> and 1. When PETSc starts to perturb the initial solution of U
>> >>> >> (which I
>> >>> >> believe properly set) to approximate the operation of J (dU), the
>> >>> >> U0
>> >>> >> get a
>> >>> >> perturbation size in the order of 100, which causes problem as U0
>> >>> >> has
>> >>> >> to be
>> >>> >> smaller than 1.
>> >>> >>
>> >>> >> From my observation, this same perturbation size, say eps, is
>> >>> >> applied
>> >>> >> on all
>> >>> >> U0, U1, U2, etc. <=== Is this the default setting?
>> >>> >> I also guess that this eps, in the order of 100, is determined from
>> >>> >> my
>> >>> >> initial solution vector and other related PETSc parameters.  <===
>> >>> >> Is
>> >>> >> my
>> >>> >> guessing right?
>> >>> >>
>> >>> >> (3)
>> >>> >> =================================================================
>> >>> >> My question: I'd like to avoid a perturbation size ~100 on U0,
>> >>> >> i.e., I
>> >>> >> have
>> >>> >> to limit it to be ~0.01 (or some small number) to avoid the U0 > 1
>> >>> >> situation. Is there any way to control that?
>> >>> >> Or, is there any advanced option to control the perturbation size
>> >>> >> on
>> >>> >> different variables when using snes_mf_operator?
>> >>> >
>> >>> > Here is a description of the algorithm for calculating h. It seems
>> >>> > to
>> >>> > me a better way to do this
>> >>> > is to non-dimensionalize first.
>> >>>
>> >>> I forgot the URL:
>> >>>
>> >>>
>> >>> http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Mat/MatCreateMFFD.html#MatCreateMFFD
>> >>>
>> >>>    Matt
>> >>>
>> >>> >    Matt
>> >>> >
>> >>> >>
>> >>> >> Hope my explanation is clear. Please let me know if it is not.
>> >>> >>
>> >>> >>
>> >>> >> Best Regards,
>> >>> >>
>> >>> >> Ling
>> >>> >>
>> >>> >
>> >>> >
>> >>> >
>> >>> > --
>> >>> > What most experimenters take for granted before they begin their
>> >>> > experiments is infinitely more interesting than any results to which
>> >>> > their experiments lead.
>> >>> > -- Norbert Wiener
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> What most experimenters take for granted before they begin their
>> >>> experiments is infinitely more interesting than any results to which
>> >>> their experiments lead.
>> >>> -- Norbert Wiener
>> >>
>> >>
>> >
>>
>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which
>> their experiments lead.
>> -- Norbert Wiener
>
>



--
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener

From ling.zou at inl.gov  Tue Dec 11 16:59:10 2012
From: ling.zou at inl.gov (Zou (Non-US), Ling)
Date: Tue, 11 Dec 2012 15:59:10 -0700
Subject: [petsc-users] how to control snes_mf_operator
In-Reply-To: <CAMYG4G=F3vkRouueeMc+Gbx78spFm8sDASmMat7AyGq-t+VK9Q@mail.gmail.com>
References: <CAJWxZDQLxsEa_5GAg6A5DsZhSvFSFRJaOzqQwQcqMtvU1vA3Nw@mail.gmail.com>
	<CAMYG4G=Y-Tc8PVRwq9XmtT8hMUOfffnsnFC9=gi7VeJdySWWhQ@mail.gmail.com>
	<CAMYG4G=9H9o=2d74EyT9Jr__qX5Z6TQCj95-df=1dn36cA=Sxw@mail.gmail.com>
	<CAJWxZDQBQW44q8eNkpkzXZN9FqCazdGLkPuPDO5hT=+UKsKSEQ@mail.gmail.com>
	<CAJWxZDRmA4fFvfUc6Zenag2ZkgBMRMsQPuRDjtpc=PrjOgDHog@mail.gmail.com>
	<CAMYG4Gmz5PCy-XTbAqKOgcc1cLihGV2MH7Bhe6nqcKE8bB8=OA@mail.gmail.com>
	<CAJWxZDQDo5WovbxYcBFsjLZLNDnFb6FoB7psRxxoq=qDNmANEg@mail.gmail.com>
	<CAMYG4G=F3vkRouueeMc+Gbx78spFm8sDASmMat7AyGq-t+VK9Q@mail.gmail.com>
Message-ID: <CAJWxZDTAAYYZgoCB4NgU417_BRakEsTCtMChrx54OaMoeRejEg@mail.gmail.com>

ok. I tried. Seems there is no effect.

./my-moose-project -i input.i -snes_type test -mat_mffd_umin
1.e-10 -snes_test_display > out

Also, the webpage says:
*-mat_mffd_unim <umin>

*I am not quite sure if 'unim' is a typo. I tried both 'umin' and 'unim'
anyway.

Ling



On Tue, Dec 11, 2012 at 3:50 PM, Matthew Knepley <knepley at gmail.com> wrote:

> On Tue, Dec 11, 2012 at 2:40 PM, Zou (Non-US), Ling <ling.zou at inl.gov>
> wrote:
> > Hmm... I have an 'approximated' analytical Jacobian to compare. And I did
> > this:
> >
> > ./my-moose-project -i input.i -snes_type test -snes_test_display > out
> >
> > I actually found out that the PETSc provided FD Jacobian gives 'nan'
> > numbers, while my approximated Jacobian does not give 'nan' at the same
> > positions.
> >
> > As we discussed in the previous emails, the perturbation on U0 is too
> large,
> > which makes 'nan' appear in the FD Jacobians. So....I am trying to use a
> > smaller '-mat_mffd_err <number here>', to see if I could get an easy fix
> by
> > now, like this,
>
> I don't think 'err' has anything to do with it. If you read the page I
> mailed you, I
> believe umin can be made very small.
>
>    Matt
>
> > ./my-moose-project -i input.i -snes_type test -md_mffd_err 1.e-10
> > -snes_test_display > out
> >
> > seems not working :-(
> > no matter what number I give to -md_mffd_err, the print out results seem
> not
> > changed.
> >
> > But of course, non-dimensionalization might be the ultimate solution.
> >
> > Ling
> >
> > On Tue, Dec 11, 2012 at 3:29 PM, Matthew Knepley <knepley at gmail.com>
> wrote:
> >>
> >> On Tue, Dec 11, 2012 at 2:19 PM, Zou (Non-US), Ling <ling.zou at inl.gov>
> >> wrote:
> >> > Matt, one more question.
> >> >
> >> > Can I combine the options
> >> > -snes_type test
> >> > and
> >> > -mat_mffd_err 1.e-10
> >> > to see the effect?
> >>
> >> I do not understand your question. test does compare the analytic and
> >> FD Jacobian
> >> actions, but I thought you did not have an analytic action.
> >>
> >>    Matt
> >>
> >> > Best,
> >> >
> >> > Ling
> >> >
> >> >
> >> >
> >> > On Tue, Dec 11, 2012 at 2:47 PM, Zou (Non-US), Ling <ling.zou at inl.gov
> >
> >> > wrote:
> >> >>
> >> >> thank you Matt. I will try to figure it out. Non-dimensionalization
> is
> >> >> certainly something worth to try.
> >> >>
> >> >> Best,
> >> >>
> >> >> Ling
> >> >>
> >> >>
> >> >> On Tue, Dec 11, 2012 at 2:41 PM, Matthew Knepley <knepley at gmail.com>
> >> >> wrote:
> >> >>>
> >> >>> On Tue, Dec 11, 2012 at 1:40 PM, Matthew Knepley <knepley at gmail.com
> >
> >> >>> wrote:
> >> >>> > On Tue, Dec 11, 2012 at 1:34 PM, Zou (Non-US), Ling
> >> >>> > <ling.zou at inl.gov>
> >> >>> > wrote:
> >> >>> >> Dear All,
> >> >>> >>
> >> >>> >> I have recently had an issue using snes_mf_operator. I've tried
> to
> >> >>> >> figure it
> >> >>> >> out from PETSc manual and PETSc website but didn't get any luck,
> so
> >> >>> >> I
> >> >>> >> submit
> >> >>> >> my question here and hope some one could help me out.
> >> >>> >>
> >> >>> >> (1)
> >> >>> >> =================================================================
> >> >>> >> A little bit background here: my problem has 7 variables, i.e.,
> >> >>> >>
> >> >>> >> U = [U0, U1, U2, U3, U4, U5, U6]
> >> >>> >>
> >> >>> >> U0 is in the order of 1.
> >> >>> >> U1, U2, U4 and U5 in the oder of 100.
> >> >>> >> U3 and U6 are in the order of 1.e8.
> >> >>> >>
> >> >>> >> I believe this should be quite common for most PETSc users.
> >> >>> >>
> >> >>> >> (2)
> >> >>> >> =================================================================
> >> >>> >> My problem here is, U0, by its physical meaning, has to be
> limited
> >> >>> >> between 0
> >> >>> >> and 1. When PETSc starts to perturb the initial solution of U
> >> >>> >> (which I
> >> >>> >> believe properly set) to approximate the operation of J (dU), the
> >> >>> >> U0
> >> >>> >> get a
> >> >>> >> perturbation size in the order of 100, which causes problem as U0
> >> >>> >> has
> >> >>> >> to be
> >> >>> >> smaller than 1.
> >> >>> >>
> >> >>> >> From my observation, this same perturbation size, say eps, is
> >> >>> >> applied
> >> >>> >> on all
> >> >>> >> U0, U1, U2, etc. <=== Is this the default setting?
> >> >>> >> I also guess that this eps, in the order of 100, is determined
> from
> >> >>> >> my
> >> >>> >> initial solution vector and other related PETSc parameters.  <===
> >> >>> >> Is
> >> >>> >> my
> >> >>> >> guessing right?
> >> >>> >>
> >> >>> >> (3)
> >> >>> >> =================================================================
> >> >>> >> My question: I'd like to avoid a perturbation size ~100 on U0,
> >> >>> >> i.e., I
> >> >>> >> have
> >> >>> >> to limit it to be ~0.01 (or some small number) to avoid the U0 >
> 1
> >> >>> >> situation. Is there any way to control that?
> >> >>> >> Or, is there any advanced option to control the perturbation size
> >> >>> >> on
> >> >>> >> different variables when using snes_mf_operator?
> >> >>> >
> >> >>> > Here is a description of the algorithm for calculating h. It seems
> >> >>> > to
> >> >>> > me a better way to do this
> >> >>> > is to non-dimensionalize first.
> >> >>>
> >> >>> I forgot the URL:
> >> >>>
> >> >>>
> >> >>>
> http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Mat/MatCreateMFFD.html#MatCreateMFFD
> >> >>>
> >> >>>    Matt
> >> >>>
> >> >>> >    Matt
> >> >>> >
> >> >>> >>
> >> >>> >> Hope my explanation is clear. Please let me know if it is not.
> >> >>> >>
> >> >>> >>
> >> >>> >> Best Regards,
> >> >>> >>
> >> >>> >> Ling
> >> >>> >>
> >> >>> >
> >> >>> >
> >> >>> >
> >> >>> > --
> >> >>> > What most experimenters take for granted before they begin their
> >> >>> > experiments is infinitely more interesting than any results to
> which
> >> >>> > their experiments lead.
> >> >>> > -- Norbert Wiener
> >> >>>
> >> >>>
> >> >>>
> >> >>> --
> >> >>> What most experimenters take for granted before they begin their
> >> >>> experiments is infinitely more interesting than any results to which
> >> >>> their experiments lead.
> >> >>> -- Norbert Wiener
> >> >>
> >> >>
> >> >
> >>
> >>
> >>
> >> --
> >> What most experimenters take for granted before they begin their
> >> experiments is infinitely more interesting than any results to which
> >> their experiments lead.
> >> -- Norbert Wiener
> >
> >
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which
> their experiments lead.
> -- Norbert Wiener
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121211/26488826/attachment-0001.html>

From knepley at gmail.com  Tue Dec 11 17:02:14 2012
From: knepley at gmail.com (Matthew Knepley)
Date: Tue, 11 Dec 2012 15:02:14 -0800
Subject: [petsc-users] how to control snes_mf_operator
In-Reply-To: <CAJWxZDTAAYYZgoCB4NgU417_BRakEsTCtMChrx54OaMoeRejEg@mail.gmail.com>
References: <CAJWxZDQLxsEa_5GAg6A5DsZhSvFSFRJaOzqQwQcqMtvU1vA3Nw@mail.gmail.com>
	<CAMYG4G=Y-Tc8PVRwq9XmtT8hMUOfffnsnFC9=gi7VeJdySWWhQ@mail.gmail.com>
	<CAMYG4G=9H9o=2d74EyT9Jr__qX5Z6TQCj95-df=1dn36cA=Sxw@mail.gmail.com>
	<CAJWxZDQBQW44q8eNkpkzXZN9FqCazdGLkPuPDO5hT=+UKsKSEQ@mail.gmail.com>
	<CAJWxZDRmA4fFvfUc6Zenag2ZkgBMRMsQPuRDjtpc=PrjOgDHog@mail.gmail.com>
	<CAMYG4Gmz5PCy-XTbAqKOgcc1cLihGV2MH7Bhe6nqcKE8bB8=OA@mail.gmail.com>
	<CAJWxZDQDo5WovbxYcBFsjLZLNDnFb6FoB7psRxxoq=qDNmANEg@mail.gmail.com>
	<CAMYG4G=F3vkRouueeMc+Gbx78spFm8sDASmMat7AyGq-t+VK9Q@mail.gmail.com>
	<CAJWxZDTAAYYZgoCB4NgU417_BRakEsTCtMChrx54OaMoeRejEg@mail.gmail.com>
Message-ID: <CAMYG4Gk_uK8Qi6s6p=XVKZwVMGNKQf3YkbxTfBes4Qvr_MEOdg@mail.gmail.com>

On Tue, Dec 11, 2012 at 2:59 PM, Zou (Non-US), Ling <ling.zou at inl.gov>wrote:

> ok. I tried. Seems there is no effect.
>
> ./my-moose-project -i input.i -snes_type test -mat_mffd_umin
> 1.e-10 -snes_test_display > out
>
> Also, the webpage says:
> *-mat_mffd_unim <umin>
>
> *I am not quite sure if 'unim' is a typo. I tried both 'umin' and 'unim'
> anyway.
>

You can check what is coming in, right? But this is all academic, with that
scaling, you will get almost
no significant figures in the Jacobian for those unknowns, so why worry
about it. Nondimensionalize.

    Matt


> Ling
>
>
>
> On Tue, Dec 11, 2012 at 3:50 PM, Matthew Knepley <knepley at gmail.com>wrote:
>
>> On Tue, Dec 11, 2012 at 2:40 PM, Zou (Non-US), Ling <ling.zou at inl.gov>
>> wrote:
>> > Hmm... I have an 'approximated' analytical Jacobian to compare. And I
>> did
>> > this:
>> >
>> > ./my-moose-project -i input.i -snes_type test -snes_test_display > out
>> >
>> > I actually found out that the PETSc provided FD Jacobian gives 'nan'
>> > numbers, while my approximated Jacobian does not give 'nan' at the same
>> > positions.
>> >
>> > As we discussed in the previous emails, the perturbation on U0 is too
>> large,
>> > which makes 'nan' appear in the FD Jacobians. So....I am trying to use a
>> > smaller '-mat_mffd_err <number here>', to see if I could get an easy
>> fix by
>> > now, like this,
>>
>> I don't think 'err' has anything to do with it. If you read the page I
>> mailed you, I
>> believe umin can be made very small.
>>
>>    Matt
>>
>> > ./my-moose-project -i input.i -snes_type test -md_mffd_err 1.e-10
>> > -snes_test_display > out
>> >
>> > seems not working :-(
>> > no matter what number I give to -md_mffd_err, the print out results
>> seem not
>> > changed.
>> >
>> > But of course, non-dimensionalization might be the ultimate solution.
>> >
>> > Ling
>> >
>> > On Tue, Dec 11, 2012 at 3:29 PM, Matthew Knepley <knepley at gmail.com>
>> wrote:
>> >>
>> >> On Tue, Dec 11, 2012 at 2:19 PM, Zou (Non-US), Ling <ling.zou at inl.gov>
>> >> wrote:
>> >> > Matt, one more question.
>> >> >
>> >> > Can I combine the options
>> >> > -snes_type test
>> >> > and
>> >> > -mat_mffd_err 1.e-10
>> >> > to see the effect?
>> >>
>> >> I do not understand your question. test does compare the analytic and
>> >> FD Jacobian
>> >> actions, but I thought you did not have an analytic action.
>> >>
>> >>    Matt
>> >>
>> >> > Best,
>> >> >
>> >> > Ling
>> >> >
>> >> >
>> >> >
>> >> > On Tue, Dec 11, 2012 at 2:47 PM, Zou (Non-US), Ling <
>> ling.zou at inl.gov>
>> >> > wrote:
>> >> >>
>> >> >> thank you Matt. I will try to figure it out. Non-dimensionalization
>> is
>> >> >> certainly something worth to try.
>> >> >>
>> >> >> Best,
>> >> >>
>> >> >> Ling
>> >> >>
>> >> >>
>> >> >> On Tue, Dec 11, 2012 at 2:41 PM, Matthew Knepley <knepley at gmail.com
>> >
>> >> >> wrote:
>> >> >>>
>> >> >>> On Tue, Dec 11, 2012 at 1:40 PM, Matthew Knepley <
>> knepley at gmail.com>
>> >> >>> wrote:
>> >> >>> > On Tue, Dec 11, 2012 at 1:34 PM, Zou (Non-US), Ling
>> >> >>> > <ling.zou at inl.gov>
>> >> >>> > wrote:
>> >> >>> >> Dear All,
>> >> >>> >>
>> >> >>> >> I have recently had an issue using snes_mf_operator. I've tried
>> to
>> >> >>> >> figure it
>> >> >>> >> out from PETSc manual and PETSc website but didn't get any
>> luck, so
>> >> >>> >> I
>> >> >>> >> submit
>> >> >>> >> my question here and hope some one could help me out.
>> >> >>> >>
>> >> >>> >> (1)
>> >> >>> >>
>> =================================================================
>> >> >>> >> A little bit background here: my problem has 7 variables, i.e.,
>> >> >>> >>
>> >> >>> >> U = [U0, U1, U2, U3, U4, U5, U6]
>> >> >>> >>
>> >> >>> >> U0 is in the order of 1.
>> >> >>> >> U1, U2, U4 and U5 in the oder of 100.
>> >> >>> >> U3 and U6 are in the order of 1.e8.
>> >> >>> >>
>> >> >>> >> I believe this should be quite common for most PETSc users.
>> >> >>> >>
>> >> >>> >> (2)
>> >> >>> >>
>> =================================================================
>> >> >>> >> My problem here is, U0, by its physical meaning, has to be
>> limited
>> >> >>> >> between 0
>> >> >>> >> and 1. When PETSc starts to perturb the initial solution of U
>> >> >>> >> (which I
>> >> >>> >> believe properly set) to approximate the operation of J (dU),
>> the
>> >> >>> >> U0
>> >> >>> >> get a
>> >> >>> >> perturbation size in the order of 100, which causes problem as
>> U0
>> >> >>> >> has
>> >> >>> >> to be
>> >> >>> >> smaller than 1.
>> >> >>> >>
>> >> >>> >> From my observation, this same perturbation size, say eps, is
>> >> >>> >> applied
>> >> >>> >> on all
>> >> >>> >> U0, U1, U2, etc. <=== Is this the default setting?
>> >> >>> >> I also guess that this eps, in the order of 100, is determined
>> from
>> >> >>> >> my
>> >> >>> >> initial solution vector and other related PETSc parameters.
>>  <===
>> >> >>> >> Is
>> >> >>> >> my
>> >> >>> >> guessing right?
>> >> >>> >>
>> >> >>> >> (3)
>> >> >>> >>
>> =================================================================
>> >> >>> >> My question: I'd like to avoid a perturbation size ~100 on U0,
>> >> >>> >> i.e., I
>> >> >>> >> have
>> >> >>> >> to limit it to be ~0.01 (or some small number) to avoid the U0
>> > 1
>> >> >>> >> situation. Is there any way to control that?
>> >> >>> >> Or, is there any advanced option to control the perturbation
>> size
>> >> >>> >> on
>> >> >>> >> different variables when using snes_mf_operator?
>> >> >>> >
>> >> >>> > Here is a description of the algorithm for calculating h. It
>> seems
>> >> >>> > to
>> >> >>> > me a better way to do this
>> >> >>> > is to non-dimensionalize first.
>> >> >>>
>> >> >>> I forgot the URL:
>> >> >>>
>> >> >>>
>> >> >>>
>> http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Mat/MatCreateMFFD.html#MatCreateMFFD
>> >> >>>
>> >> >>>    Matt
>> >> >>>
>> >> >>> >    Matt
>> >> >>> >
>> >> >>> >>
>> >> >>> >> Hope my explanation is clear. Please let me know if it is not.
>> >> >>> >>
>> >> >>> >>
>> >> >>> >> Best Regards,
>> >> >>> >>
>> >> >>> >> Ling
>> >> >>> >>
>> >> >>> >
>> >> >>> >
>> >> >>> >
>> >> >>> > --
>> >> >>> > What most experimenters take for granted before they begin their
>> >> >>> > experiments is infinitely more interesting than any results to
>> which
>> >> >>> > their experiments lead.
>> >> >>> > -- Norbert Wiener
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>> --
>> >> >>> What most experimenters take for granted before they begin their
>> >> >>> experiments is infinitely more interesting than any results to
>> which
>> >> >>> their experiments lead.
>> >> >>> -- Norbert Wiener
>> >> >>
>> >> >>
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> What most experimenters take for granted before they begin their
>> >> experiments is infinitely more interesting than any results to which
>> >> their experiments lead.
>> >> -- Norbert Wiener
>> >
>> >
>>
>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which
>> their experiments lead.
>> -- Norbert Wiener
>>
>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121211/563161fb/attachment.html>

From ling.zou at inl.gov  Tue Dec 11 17:54:56 2012
From: ling.zou at inl.gov (Zou (Non-US), Ling)
Date: Tue, 11 Dec 2012 16:54:56 -0700
Subject: [petsc-users] how to control snes_mf_operator
In-Reply-To: <CAMYG4Gk_uK8Qi6s6p=XVKZwVMGNKQf3YkbxTfBes4Qvr_MEOdg@mail.gmail.com>
References: <CAJWxZDQLxsEa_5GAg6A5DsZhSvFSFRJaOzqQwQcqMtvU1vA3Nw@mail.gmail.com>
	<CAMYG4G=Y-Tc8PVRwq9XmtT8hMUOfffnsnFC9=gi7VeJdySWWhQ@mail.gmail.com>
	<CAMYG4G=9H9o=2d74EyT9Jr__qX5Z6TQCj95-df=1dn36cA=Sxw@mail.gmail.com>
	<CAJWxZDQBQW44q8eNkpkzXZN9FqCazdGLkPuPDO5hT=+UKsKSEQ@mail.gmail.com>
	<CAJWxZDRmA4fFvfUc6Zenag2ZkgBMRMsQPuRDjtpc=PrjOgDHog@mail.gmail.com>
	<CAMYG4Gmz5PCy-XTbAqKOgcc1cLihGV2MH7Bhe6nqcKE8bB8=OA@mail.gmail.com>
	<CAJWxZDQDo5WovbxYcBFsjLZLNDnFb6FoB7psRxxoq=qDNmANEg@mail.gmail.com>
	<CAMYG4G=F3vkRouueeMc+Gbx78spFm8sDASmMat7AyGq-t+VK9Q@mail.gmail.com>
	<CAJWxZDTAAYYZgoCB4NgU417_BRakEsTCtMChrx54OaMoeRejEg@mail.gmail.com>
	<CAMYG4Gk_uK8Qi6s6p=XVKZwVMGNKQf3YkbxTfBes4Qvr_MEOdg@mail.gmail.com>
Message-ID: <CAJWxZDRLSPew0-nr5HW9+HM5fUZ6ykM3de10ZYACW9LbwNV6EA@mail.gmail.com>

Thank you Matt.

Ling

On Tue, Dec 11, 2012 at 4:02 PM, Matthew Knepley <knepley at gmail.com> wrote:

> On Tue, Dec 11, 2012 at 2:59 PM, Zou (Non-US), Ling <ling.zou at inl.gov>wrote:
>
>> ok. I tried. Seems there is no effect.
>>
>> ./my-moose-project -i input.i -snes_type test -mat_mffd_umin
>> 1.e-10 -snes_test_display > out
>>
>> Also, the webpage says:
>> *-mat_mffd_unim <umin>
>>
>> *I am not quite sure if 'unim' is a typo. I tried both 'umin' and 'unim'
>> anyway.
>>
>
> You can check what is coming in, right? But this is all academic, with
> that scaling, you will get almost
> no significant figures in the Jacobian for those unknowns, so why worry
> about it. Nondimensionalize.
>
>     Matt
>
>
>> Ling
>>
>>
>>
>> On Tue, Dec 11, 2012 at 3:50 PM, Matthew Knepley <knepley at gmail.com>wrote:
>>
>>> On Tue, Dec 11, 2012 at 2:40 PM, Zou (Non-US), Ling <ling.zou at inl.gov>
>>> wrote:
>>> > Hmm... I have an 'approximated' analytical Jacobian to compare. And I
>>> did
>>> > this:
>>> >
>>> > ./my-moose-project -i input.i -snes_type test -snes_test_display > out
>>> >
>>> > I actually found out that the PETSc provided FD Jacobian gives 'nan'
>>> > numbers, while my approximated Jacobian does not give 'nan' at the same
>>> > positions.
>>> >
>>> > As we discussed in the previous emails, the perturbation on U0 is too
>>> large,
>>> > which makes 'nan' appear in the FD Jacobians. So....I am trying to use
>>> a
>>> > smaller '-mat_mffd_err <number here>', to see if I could get an easy
>>> fix by
>>> > now, like this,
>>>
>>> I don't think 'err' has anything to do with it. If you read the page I
>>> mailed you, I
>>> believe umin can be made very small.
>>>
>>>    Matt
>>>
>>> > ./my-moose-project -i input.i -snes_type test -md_mffd_err 1.e-10
>>> > -snes_test_display > out
>>> >
>>> > seems not working :-(
>>> > no matter what number I give to -md_mffd_err, the print out results
>>> seem not
>>> > changed.
>>> >
>>> > But of course, non-dimensionalization might be the ultimate solution.
>>> >
>>> > Ling
>>> >
>>> > On Tue, Dec 11, 2012 at 3:29 PM, Matthew Knepley <knepley at gmail.com>
>>> wrote:
>>> >>
>>> >> On Tue, Dec 11, 2012 at 2:19 PM, Zou (Non-US), Ling <ling.zou at inl.gov
>>> >
>>> >> wrote:
>>> >> > Matt, one more question.
>>> >> >
>>> >> > Can I combine the options
>>> >> > -snes_type test
>>> >> > and
>>> >> > -mat_mffd_err 1.e-10
>>> >> > to see the effect?
>>> >>
>>> >> I do not understand your question. test does compare the analytic and
>>> >> FD Jacobian
>>> >> actions, but I thought you did not have an analytic action.
>>> >>
>>> >>    Matt
>>> >>
>>> >> > Best,
>>> >> >
>>> >> > Ling
>>> >> >
>>> >> >
>>> >> >
>>> >> > On Tue, Dec 11, 2012 at 2:47 PM, Zou (Non-US), Ling <
>>> ling.zou at inl.gov>
>>> >> > wrote:
>>> >> >>
>>> >> >> thank you Matt. I will try to figure it out.
>>> Non-dimensionalization is
>>> >> >> certainly something worth to try.
>>> >> >>
>>> >> >> Best,
>>> >> >>
>>> >> >> Ling
>>> >> >>
>>> >> >>
>>> >> >> On Tue, Dec 11, 2012 at 2:41 PM, Matthew Knepley <
>>> knepley at gmail.com>
>>> >> >> wrote:
>>> >> >>>
>>> >> >>> On Tue, Dec 11, 2012 at 1:40 PM, Matthew Knepley <
>>> knepley at gmail.com>
>>> >> >>> wrote:
>>> >> >>> > On Tue, Dec 11, 2012 at 1:34 PM, Zou (Non-US), Ling
>>> >> >>> > <ling.zou at inl.gov>
>>> >> >>> > wrote:
>>> >> >>> >> Dear All,
>>> >> >>> >>
>>> >> >>> >> I have recently had an issue using snes_mf_operator. I've
>>> tried to
>>> >> >>> >> figure it
>>> >> >>> >> out from PETSc manual and PETSc website but didn't get any
>>> luck, so
>>> >> >>> >> I
>>> >> >>> >> submit
>>> >> >>> >> my question here and hope some one could help me out.
>>> >> >>> >>
>>> >> >>> >> (1)
>>> >> >>> >>
>>> =================================================================
>>> >> >>> >> A little bit background here: my problem has 7 variables, i.e.,
>>> >> >>> >>
>>> >> >>> >> U = [U0, U1, U2, U3, U4, U5, U6]
>>> >> >>> >>
>>> >> >>> >> U0 is in the order of 1.
>>> >> >>> >> U1, U2, U4 and U5 in the oder of 100.
>>> >> >>> >> U3 and U6 are in the order of 1.e8.
>>> >> >>> >>
>>> >> >>> >> I believe this should be quite common for most PETSc users.
>>> >> >>> >>
>>> >> >>> >> (2)
>>> >> >>> >>
>>> =================================================================
>>> >> >>> >> My problem here is, U0, by its physical meaning, has to be
>>> limited
>>> >> >>> >> between 0
>>> >> >>> >> and 1. When PETSc starts to perturb the initial solution of U
>>> >> >>> >> (which I
>>> >> >>> >> believe properly set) to approximate the operation of J (dU),
>>> the
>>> >> >>> >> U0
>>> >> >>> >> get a
>>> >> >>> >> perturbation size in the order of 100, which causes problem as
>>> U0
>>> >> >>> >> has
>>> >> >>> >> to be
>>> >> >>> >> smaller than 1.
>>> >> >>> >>
>>> >> >>> >> From my observation, this same perturbation size, say eps, is
>>> >> >>> >> applied
>>> >> >>> >> on all
>>> >> >>> >> U0, U1, U2, etc. <=== Is this the default setting?
>>> >> >>> >> I also guess that this eps, in the order of 100, is determined
>>> from
>>> >> >>> >> my
>>> >> >>> >> initial solution vector and other related PETSc parameters.
>>>  <===
>>> >> >>> >> Is
>>> >> >>> >> my
>>> >> >>> >> guessing right?
>>> >> >>> >>
>>> >> >>> >> (3)
>>> >> >>> >>
>>> =================================================================
>>> >> >>> >> My question: I'd like to avoid a perturbation size ~100 on U0,
>>> >> >>> >> i.e., I
>>> >> >>> >> have
>>> >> >>> >> to limit it to be ~0.01 (or some small number) to avoid the U0
>>> > 1
>>> >> >>> >> situation. Is there any way to control that?
>>> >> >>> >> Or, is there any advanced option to control the perturbation
>>> size
>>> >> >>> >> on
>>> >> >>> >> different variables when using snes_mf_operator?
>>> >> >>> >
>>> >> >>> > Here is a description of the algorithm for calculating h. It
>>> seems
>>> >> >>> > to
>>> >> >>> > me a better way to do this
>>> >> >>> > is to non-dimensionalize first.
>>> >> >>>
>>> >> >>> I forgot the URL:
>>> >> >>>
>>> >> >>>
>>> >> >>>
>>> http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Mat/MatCreateMFFD.html#MatCreateMFFD
>>> >> >>>
>>> >> >>>    Matt
>>> >> >>>
>>> >> >>> >    Matt
>>> >> >>> >
>>> >> >>> >>
>>> >> >>> >> Hope my explanation is clear. Please let me know if it is not.
>>> >> >>> >>
>>> >> >>> >>
>>> >> >>> >> Best Regards,
>>> >> >>> >>
>>> >> >>> >> Ling
>>> >> >>> >>
>>> >> >>> >
>>> >> >>> >
>>> >> >>> >
>>> >> >>> > --
>>> >> >>> > What most experimenters take for granted before they begin their
>>> >> >>> > experiments is infinitely more interesting than any results to
>>> which
>>> >> >>> > their experiments lead.
>>> >> >>> > -- Norbert Wiener
>>> >> >>>
>>> >> >>>
>>> >> >>>
>>> >> >>> --
>>> >> >>> What most experimenters take for granted before they begin their
>>> >> >>> experiments is infinitely more interesting than any results to
>>> which
>>> >> >>> their experiments lead.
>>> >> >>> -- Norbert Wiener
>>> >> >>
>>> >> >>
>>> >> >
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> What most experimenters take for granted before they begin their
>>> >> experiments is infinitely more interesting than any results to which
>>> >> their experiments lead.
>>> >> -- Norbert Wiener
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which
>>> their experiments lead.
>>> -- Norbert Wiener
>>>
>>
>>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121211/ba8a7795/attachment-0001.html>

From m.guterres at gmail.com  Thu Dec 13 04:55:10 2012
From: m.guterres at gmail.com (Marcelo Guterres)
Date: Thu, 13 Dec 2012 08:55:10 -0200
Subject: [petsc-users] question of a Brazilian student about the use of
	"PETSC" in a cluster
Message-ID: <50C9B40E.5090101@gmail.com>

Hello,

My name is Marcelo Guterres and I am PhD student in Brazil.

I use the PETSC in a cluster with 11 computers, each with 8 processors 
"Xeon 2.8GHz", "16GB RAM" and "4 HD SAS" from "146GB".

My question is about the PETSc following functions:

/    -> Ierr = MPI_Comm_rank (MPI_COMM_WORLD, & rank); CHKERRQ (ierr);
     -> Ierr = MPI_Comm_size (MPI_COMM_WORLD, & size); CHKERRQ (ierr);/


For example, using only MPI:

----------------------------------------------------------------------------------------------

// Program hello word using MPI

#include <iostream>
using namespace std;
#include <mpi.h>

int main(int argc, char *argv[])

{
    int size, rank;
    MPI_Init(&argc, &argv);
    MPI_Comm_size(MPI_COMM_WORLD,&size);
    MPI_Comm_rank(MPI_COMM_WORLD,&rank);

    cout << "hello. I am process " << rank << " of " << size << endl;

    if ( rank == 0)
    {
        cout << "\nFinish !!" << endl;
    }

    MPI_Finalize();
}



*****  running the command has the following output:*

[guterres at stratus hello]$ mpirun -np 3 ./hello

hello. I am process 1 of 3
hello. I am process 0 of 3

Finish !!
hello. I am process 2 of 3


CONCLUSION: int size = 3;

----------------------------------------------------------------------------------------------


using only the PETSC

----------------------------------------------------------------------------------------------


static char help[] ="\n\n hello word PETSC !!";
#include <iostream>
#include <petscsys.h>
using namespace std;

int main( int argc, char *argv[] )
{
    PetscErrorCode  ierr;

    PetscMPIInt  size,
                        rank;

    ierr = PetscInitialize(&argc,&argv,(char *)0,help);CHKERRQ(ierr);
    ierr = MPI_Comm_rank(MPI_COMM_WORLD,&rank); CHKERRQ(ierr);
    ierr = MPI_Comm_size(MPI_COMM_WORLD,&size); CHKERRQ(ierr);

    cout << "hello. I am process " << rank << " of " << size << endl;

    if ( rank == 0)
    {
        cout << "\nfinish !!" << endl;
    }


    ierr =  PetscFinalize( ); CHKERRQ(ierr);
    return 0;
}


*****  running the command has the following output:*


[guterres at stratus hello_petsc]$ mpirun -np 3 ./hello

hello. I am process 0 of 1
finish !!

hello. I am process 0 of 1
finish !!

hello. I am process 0 of 1
finish !!


----------------------------------------------------------------------------------------------


*MY QUESTION IS:*


THE OUTPUT OF THE PROGRAM WITH PETSC CORRECT???

the variable value PetscMPIInt  size = np ??

The correct output should not be:


[guterres at stratus hello_petsc]$ mpirun -np 3 ./hello

hello. I am process 0 of 3

hello. I am process 1 of 3

hello. I am process 2 of 3

finish !!

CONCLUSION: PetscMPIInt size = 3 ??


Thank you for your attention and excuse my writing in English.


Marcelo Guterres
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121213/8c07310c/attachment.html>

From bsmith at mcs.anl.gov  Thu Dec 13 07:23:59 2012
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Thu, 13 Dec 2012 07:23:59 -0600
Subject: [petsc-users] question of a Brazilian student about the use of
	"PETSC" in a cluster
In-Reply-To: <50C9B40E.5090101@gmail.com>
References: <50C9B40E.5090101@gmail.com>
Message-ID: <2B738BAF-509D-42DF-8523-69E9A2D8D32F@mcs.anl.gov>


> Marcelo,

    There is something wrong with the PETSc install. Are you absolutely sure that PETSc was ./configure with the same MPI as as the mpirun that you use to launch the program?

    You can send configure.log to petsc-maint at mcs.anl.gov if you cannot figure it out on your own.

   Barry


On Dec 13, 2012, at 4:55 AM, Marcelo Guterres <m.guterres at gmail.com> wrote:

> Hello,
> 
> My name is Marcelo Guterres and I am PhD student in Brazil.
> 
> I use the PETSC in a cluster with 11 computers, each with 8 processors "Xeon 2.8GHz", "16GB RAM" and "4 HD SAS" from "146GB".
> 
> My question is about the PETSc following functions:
> 
>     -> Ierr = MPI_Comm_rank (MPI_COMM_WORLD, & rank); CHKERRQ (ierr);
>     -> Ierr = MPI_Comm_size (MPI_COMM_WORLD, & size); CHKERRQ (ierr);
> 
> 
> For example, using only MPI:
> 
> ----------------------------------------------------------------------------------------------
> 
> // Program hello word using MPI
> 
> #include <iostream>
> using namespace std;
> #include <mpi.h>
> 
> int main(int argc, char *argv[])
> 
> {
>    int size, rank;
>    MPI_Init(&argc, &argv);
>    MPI_Comm_size(MPI_COMM_WORLD,&size);
>    MPI_Comm_rank(MPI_COMM_WORLD,&rank);
>    
>    cout << "hello. I am process " << rank << " of " << size << endl;
> 
>    if ( rank == 0)
>    {
>        cout << "\nFinish !!" << endl;
>    }
> 
>    MPI_Finalize();
> }
> 
> 
> 
> ****  running the command has the following output:
> 
> [guterres at stratus hello]$ mpirun -np 3 ./hello
> 
> hello. I am process 1 of 3
> hello. I am process 0 of 3
> 
> Finish !!
> hello. I am process 2 of 3
> 
> 
> CONCLUSION: int size = 3;
> 
> ----------------------------------------------------------------------------------------------
> 
> 
> using only the PETSC
> 
> ----------------------------------------------------------------------------------------------
> 
> 
> static char help[] ="\n\n hello word PETSC !!";
> #include <iostream>
> #include <petscsys.h>
> using namespace std;
> 
> int main( int argc, char *argv[] )
> {
>    PetscErrorCode  ierr;
>    
>    PetscMPIInt  size, 
>                        rank;
>   
>    ierr = PetscInitialize(&argc,&argv,(char *)0,help);CHKERRQ(ierr);
>    ierr = MPI_Comm_rank(MPI_COMM_WORLD,&rank); CHKERRQ(ierr);
>    ierr = MPI_Comm_size(MPI_COMM_WORLD,&size); CHKERRQ(ierr);
> 
>    cout << "hello. I am process " << rank << " of " << size << endl;
>    
>    if ( rank == 0)
>    {
>        cout << "\nfinish !!" << endl;
>    }
> 
> 
>    ierr =  PetscFinalize( ); CHKERRQ(ierr);
>    return 0;
> }
> 
> 
> ****  running the command has the following output:
> 
> 
> [guterres at stratus hello_petsc]$ mpirun -np 3 ./hello
> 
> hello. I am process 0 of 1
> finish !!
> 
> hello. I am process 0 of 1
> finish !!
> 
> hello. I am process 0 of 1
> finish !!
> 
> 
> ----------------------------------------------------------------------------------------------
> 
> 
> MY QUESTION IS:
> 
> 
> THE OUTPUT OF THE PROGRAM WITH PETSC CORRECT???
> 
> the variable value PetscMPIInt  size = np ??
> 
> The correct output should not be:
> 
> 
> [guterres at stratus hello_petsc]$ mpirun -np 3 ./hello
> 
> hello. I am process 0 of 3
> 
> hello. I am process 1 of 3
> 
> hello. I am process 2 of 3
> 
> finish !!
> 
> CONCLUSION: PetscMPIInt size = 3 ??
> 
> 
> Thank you for your attention and excuse my writing in English.
> 
> 
> Marcelo Guterres


From thomas.witkowski at tu-dresden.de  Thu Dec 13 09:41:33 2012
From: thomas.witkowski at tu-dresden.de (Thomas Witkowski)
Date: Thu, 13 Dec 2012 16:41:33 +0100
Subject: [petsc-users] Some tricky problem in my multilevel feti dp code
Message-ID: <50C9F72D.2010006@tu-dresden.de>

I have some problem in the implementation of my multilevel FETI DP code, 
where two block structured matrices must be multiplied. I'll give my 
best to explain the problem, may be one of you have an idea how to 
implement it. I think, the best is to make a small example: lets assume 
we have 16 subdomains, uniformly subdividing a unit square. Each of the 
subdomain matrices is purely local, thus they have the communicator 
PETSC_COMM_SELF. Each of them is of size n x n. There is a coarse grid 
matrix, with communicator PETSC_COMM_WORLD and of size m x m. The 
coupling matrices between the global coarse grid and the local matrices 
are also global, so they are of size 16n x m and m x 16n, respectively. 
So far, everything is fine and works perfectly. Now I introduce four 
"local coarse grids", each of them couples four local subdomains, and is 
defined on a subset communicator of PETSC_COMM_WORLD. Say, each "local 
coarse grid" matrix is of size p x p, and there are also coupling 
matrices of size 4n x p and p x 4n. Now I have to perform a MatMatMult 
of the local coarse coupling matrices p x 4n with the global coupling 
matrix 16n x m. So the final matrix is of size 4p x m. But I cannot 
perform the MatMatMult, as the matrix sizes do not fit and the 
communicators are not compatible.

Is it possible to understand, what I want to do? :) Any idea, how to 
implement it?

Thomas

From knepley at gmail.com  Thu Dec 13 10:43:36 2012
From: knepley at gmail.com (Matthew Knepley)
Date: Thu, 13 Dec 2012 08:43:36 -0800
Subject: [petsc-users] Some tricky problem in my multilevel feti dp code
In-Reply-To: <50C9F72D.2010006@tu-dresden.de>
References: <50C9F72D.2010006@tu-dresden.de>
Message-ID: <CAMYG4Gmzn2_6nEGNG14r-vmM4jEXCU6eP76tpK8mxeu=GXGmWA@mail.gmail.com>

On Thu, Dec 13, 2012 at 7:41 AM, Thomas Witkowski
<thomas.witkowski at tu-dresden.de> wrote:
> I have some problem in the implementation of my multilevel FETI DP code,
> where two block structured matrices must be multiplied. I'll give my best to
> explain the problem, may be one of you have an idea how to implement it. I
> think, the best is to make a small example: lets assume we have 16
> subdomains, uniformly subdividing a unit square. Each of the subdomain
> matrices is purely local, thus they have the communicator PETSC_COMM_SELF.
> Each of them is of size n x n. There is a coarse grid matrix, with
> communicator PETSC_COMM_WORLD and of size m x m. The coupling matrices
> between the global coarse grid and the local matrices are also global, so
> they are of size 16n x m and m x 16n, respectively. So far, everything is
> fine and works perfectly. Now I introduce four "local coarse grids", each of
> them couples four local subdomains, and is defined on a subset communicator
> of PETSC_COMM_WORLD. Say, each "local coarse grid" matrix is of size p x p,
> and there are also coupling matrices of size 4n x p and p x 4n. Now I have
> to perform a MatMatMult of the local coarse coupling matrices p x 4n with
> the global coupling matrix 16n x m. So the final matrix is of size 4p x m.
> But I cannot perform the MatMatMult, as the matrix sizes do not fit and the
> communicators are not compatible.
>
> Is it possible to understand, what I want to do? :) Any idea, how to
> implement it?

It sounds like you need to redistribute the matrix before the
MatMatMult. I think you
can do this with MatGetSubmatrix(), if I understand your problem
correctly. You probably
need to move the matrix from the subcomm to the global comm first,
with empty entries
on some procs. I would just do it the naive way first, then profile to
see how it does.

   Matt

> Thomas



--
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener

From mark.adams at columbia.edu  Thu Dec 13 11:22:41 2012
From: mark.adams at columbia.edu (Mark F. Adams)
Date: Thu, 13 Dec 2012 12:22:41 -0500
Subject: [petsc-users] Some tricky problem in my multilevel feti dp code
In-Reply-To: <CAMYG4Gmzn2_6nEGNG14r-vmM4jEXCU6eP76tpK8mxeu=GXGmWA@mail.gmail.com>
References: <50C9F72D.2010006@tu-dresden.de>
	<CAMYG4Gmzn2_6nEGNG14r-vmM4jEXCU6eP76tpK8mxeu=GXGmWA@mail.gmail.com>
Message-ID: <1F23D79C-569C-405B-A68C-BAB9E310C2D3@columbia.edu>

You might also be able to put everything in a global comm.  Then you have a bunch of block diagonal matrix ops.  There is no performance penalty other then in reductions (e.g., when PETSc figures out its scatter stuff) and you have to keep track of where the local problem "starts" in the global matrix, but it might be simpler.  Note, your local LU solves will now be a global looking block Jacobi with a sub LU solver, but its the same thing.


On Dec 13, 2012, at 11:43 AM, Matthew Knepley <knepley at gmail.com> wrote:

> On Thu, Dec 13, 2012 at 7:41 AM, Thomas Witkowski
> <thomas.witkowski at tu-dresden.de> wrote:
>> I have some problem in the implementation of my multilevel FETI DP code,
>> where two block structured matrices must be multiplied. I'll give my best to
>> explain the problem, may be one of you have an idea how to implement it. I
>> think, the best is to make a small example: lets assume we have 16
>> subdomains, uniformly subdividing a unit square. Each of the subdomain
>> matrices is purely local, thus they have the communicator PETSC_COMM_SELF.
>> Each of them is of size n x n. There is a coarse grid matrix, with
>> communicator PETSC_COMM_WORLD and of size m x m. The coupling matrices
>> between the global coarse grid and the local matrices are also global, so
>> they are of size 16n x m and m x 16n, respectively. So far, everything is
>> fine and works perfectly. Now I introduce four "local coarse grids", each of
>> them couples four local subdomains, and is defined on a subset communicator
>> of PETSC_COMM_WORLD. Say, each "local coarse grid" matrix is of size p x p,
>> and there are also coupling matrices of size 4n x p and p x 4n. Now I have
>> to perform a MatMatMult of the local coarse coupling matrices p x 4n with
>> the global coupling matrix 16n x m. So the final matrix is of size 4p x m.
>> But I cannot perform the MatMatMult, as the matrix sizes do not fit and the
>> communicators are not compatible.
>> 
>> Is it possible to understand, what I want to do? :) Any idea, how to
>> implement it?
> 
> It sounds like you need to redistribute the matrix before the
> MatMatMult. I think you
> can do this with MatGetSubmatrix(), if I understand your problem
> correctly. You probably
> need to move the matrix from the subcomm to the global comm first,
> with empty entries
> on some procs. I would just do it the naive way first, then profile to
> see how it does.
> 
>   Matt
> 
>> Thomas
> 
> 
> 
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which
> their experiments lead.
> -- Norbert Wiener
> 


From gokhalen at gmail.com  Thu Dec 13 11:50:42 2012
From: gokhalen at gmail.com (Nachiket Gokhale)
Date: Thu, 13 Dec 2012 12:50:42 -0500
Subject: [petsc-users] MatCreateComposite question
Message-ID: <CAGBgCJEm72+694BGwbhvjnewOiwgPQGRLM3UfOYb13nbhrKOCg@mail.gmail.com>

I am trying to create a composite matrix -  the relevant snippet of the
code is -

Mat KK, AA[3];
ierr = MatDuplicate(KFullMat,MAT_COPY_VALUES,&AA[0]);CHKERRQ(ierr);
ierr = MatDuplicate(CFullMat,MAT_COPY_VALUES,&AA[1]); CHKERRQ(ierr);
ierr = MatDuplicate(MFullMat,MAT_COPY_VALUES,&AA[2]); CHKERRQ(ierr);

ierr = MatScale(AA[1],iomega);CHKERRQ(ierr);
ierr = MatScale(AA[2],-forcomega2); CHKERRQ(ierr);

ierr = MatCreateComposite(PETSC_COMM_WORLD,3,AA,&KK); CHKERRQ(ierr);
ierr = MatCompositeMerge(KK); CHKERRQ(ierr);

This crashes with the error at the end of the message. Do you have any
ideas about what might be causing this?  Is there any other debugging
output I should send - log_summary perhaps?

-Nachiket

[0]PETSC ERROR: --------------------- Error Message
------------------------------------
[0]PETSC ERROR: Object is in wrong state!
[0]PETSC ERROR: Must call MatXXXSetPreallocation() or MatSetUp() on
argument 1 "mat" before MatAssemblyBegin()!
[0]PETSC ERROR:
------------------------------------------------------------------------
[0]PETSC ERROR: Petsc Release Version 3.3.0, Patch 2, Fri Jul 13 15:42:00
CDT 2012
[0]PETSC ERROR: See docs/changes/index.html for recent updates.
[0]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
[0]PETSC ERROR: See docs/index.html for manual pages.
[0]PETSC ERROR:
------------------------------------------------------------------------
[0]PETSC ERROR: /home/gokhale/WAIGEN/GDEB-WAIGEN2012/bin/waigen on a
linux-gcc named asd1.wai.com by gokhale Thu Dec 13 12:54:51 2012
[0]PETSC ERROR: Libraries linked from
/opt/petsc/petsc-3.3-p2/linux-gcc-g++-mpich-mumps-complex-debug/lib
[0]PETSC ERROR: Configure run at Mon Oct 29 18:41:24 2012
[2]PETSC ERROR: --------------------- Error Message
------------------------------------
[2]PETSC ERROR: Object is in wrong state!
[2]PETSC ERROR: Must call MatXXXSetPreallocation() or MatSetUp() on
argument 1 "mat" before MatAssemblyBegin()!
[2]PETSC ERROR:
------------------------------------------------------------------------
[2]PETSC ERROR: Petsc Release Version 3.3.0, Patch 2, Fri Jul 13 15:42:00
CDT 2012
[2]PETSC ERROR: See docs/changes/index.html for recent updates.
[2]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
[2]PETSC ERROR: See docs/index.html for manual pages.
[2]PETSC ERROR: [3]PETSC ERROR: --------------------- Error Message
------------------------------------
[3]PETSC ERROR: Object is in wrong state!
[3]PETSC ERROR: Must call MatXXXSetPreallocation() or MatSetUp() on
argument 1 "mat" before MatAssemblyBegin()!
[3]PETSC ERROR:
------------------------------------------------------------------------
[3]PETSC ERROR: Petsc Release Version 3.3.0, Patch 2, Fri Jul 13 15:42:00
CDT 2012
[3]PETSC ERROR: See docs/changes/index.html for recent updates.
[3]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
[3]PETSC ERROR: See docs/index.html for manual pages.
[3]PETSC ERROR:
------------------------------------------------------------------------
[3]PETSC ERROR: /home/gokhale/WAIGEN/GDEB-WAIGEN2012/bin/waigen on a
linux-gcc named asd1.wai.com by gokhale Thu Dec 13 12:54:51 2012
[3]PETSC ERROR: [0]PETSC ERROR: Configure options --with-x=0 --with-mpi=1
--download-mpich=yes --with-x11=0 --with-debugging=1 --with-clanguage=C++
--with-shared-libraries=1 --download-mumps=yes --download-f-blas-lapack=1
--download-parmetis=1 --download-metis --download-scalapack=1
--download-blacs=1 --with-cmake=/usr/bin/cmake28 --with-scalar-type=complex
[0]PETSC ERROR:
------------------------------------------------------------------------
[0]PETSC ERROR: MatAssemblyBegin() line 4683 in
/opt/petsc/petsc-3.3-p2/src/mat/interface/matrix.c
[0]PETSC ERROR: MatCreateComposite() line 440 in
/opt/petsc/petsc-3.3-p2/src/mat/impls/composite/mcomposite.c
[0]PETSC ERROR: main() line 141 in src/examples/waigen.c
------------------------------------------------------------------------
[2]PETSC ERROR: /home/gokhale/WAIGEN/GDEB-WAIGEN2012/bin/waigen on a
linux-gcc named asd1.wai.com by gokhale Thu Dec 13 12:54:51 2012
[2]PETSC ERROR: Libraries linked from
/opt/petsc/petsc-3.3-p2/linux-gcc-g++-mpich-mumps-complex-debug/lib
[2]PETSC ERROR: Configure run at Mon Oct 29 18:41:24 2012
[2]PETSC ERROR: Configure options --with-x=0 --with-mpi=1
--download-mpich=yes --with-x11=0 --with-debugging=1 --with-clanguage=C++
--with-shared-libraries=1 --download-mumps=yes --download-f-blas-lapack=1
--download-parmetis=1 --download-metis --download-scalapack=1
--download-blacs=1 --with-cmake=/usr/bin/cmake28 --with-scalar-type=complex
[2]PETSC ERROR:
------------------------------------------------------------------------
Libraries linked from
/opt/petsc/petsc-3.3-p2/linux-gcc-g++-mpich-mumps-complex-debug/lib
[3]PETSC ERROR: Configure run at Mon Oct 29 18:41:24 2012
[3]PETSC ERROR: Configure options --with-x=0 --with-mpi=1
--download-mpich=yes --with-x11=0 --with-debugging=1 --with-clanguage=C++
--with-shared-libraries=1 --download-mumps=yes --download-f-blas-lapack=1
--download-parmetis=1 --download-metis --download-scalapack=1
--download-blacs=1 --with-cmake=/usr/bin/cmake28 --with-scalar-type=complex
[3]PETSC ERROR:
------------------------------------------------------------------------
[3]PETSC ERROR: MatAssemblyBegin() line 4683 in
/opt/petsc/petsc-3.3-p2/src/mat/interface/matrix.c
[3]PETSC ERROR: application called MPI_Abort(MPI_COMM_WORLD, 73) - process 0
[cli_0]: aborting job:
application called MPI_Abort(MPI_COMM_WORLD, 73) - process 0
[2]PETSC ERROR: MatAssemblyBegin() line 4683 in
/opt/petsc/petsc-3.3-p2/src/mat/interface/matrix.c
[2]PETSC ERROR: MatCreateComposite() line 440 in
/opt/petsc/petsc-3.3-p2/src/mat/impls/composite/mcomposite.c
MatCreateComposite() line 440 in
/opt/petsc/petsc-3.3-p2/src/mat/impls/composite/mcomposite.c
[3]PETSC ERROR: main() line 141 in src/examples/waigen.c
[2]PETSC ERROR: main() line 141 in src/examples/waigen.c
application called MPI_Abort(MPI_COMM_WORLD, 73) - process 3
application called MPI_Abort(MPI_COMM_WORLD, 73) - process 2
[cli_2]: aborting job:
application called MPI_Abort(MPI_COMM_WORLD, 73) - process 2
[cli_3]: aborting job:
application called MPI_Abort(MPI_COMM_WORLD, 73) - process 3
[1]PETSC ERROR: --------------------- Error Message
------------------------------------
[1]PETSC ERROR: Object is in wrong state!
[1]PETSC ERROR: Must call MatXXXSetPreallocation() or MatSetUp() on
argument 1 "mat" before MatAssemblyBegin()!
[1]PETSC ERROR:
------------------------------------------------------------------------
[1]PETSC ERROR: Petsc Release Version 3.3.0, Patch 2, Fri Jul 13 15:42:00
CDT 2012
[1]PETSC ERROR: See docs/changes/index.html for recent updates.
[1]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
[1]PETSC ERROR: See docs/index.html for manual pages.
[1]PETSC ERROR:
------------------------------------------------------------------------
[1]PETSC ERROR: /home/gokhale/WAIGEN/GDEB-WAIGEN2012/bin/waigen on a
linux-gcc named asd1.wai.com by gokhale Thu Dec 13 12:54:51 2012
[1]PETSC ERROR: Libraries linked from
/opt/petsc/petsc-3.3-p2/linux-gcc-g++-mpich-mumps-complex-debug/lib
[1]PETSC ERROR: Configure run at Mon Oct 29 18:41:24 2012
[1]PETSC ERROR: Configure options --with-x=0 --with-mpi=1
--download-mpich=yes --with-x11=0 --with-debugging=1 --with-clanguage=C++
--with-shared-libraries=1 --download-mumps=yes --download-f-blas-lapack=1
--download-parmetis=1 --download-metis --download-scalapack=1
--download-blacs=1 --with-cmake=/usr/bin/cmake28 --with-scalar-type=complex
[1]PETSC ERROR:
------------------------------------------------------------------------
[1]PETSC ERROR: MatAssemblyBegin() line 4683 in
/opt/petsc/petsc-3.3-p2/src/mat/interface/matrix.c
[1]PETSC ERROR: MatCreateComposite() line 440 in
/opt/petsc/petsc-3.3-p2/src/mat/impls/composite/mcomposite.c
[1]PETSC ERROR: main() line 141 in src/examples/waigen.c
application called MPI_Abort(MPI_COMM_WORLD, 73) - process 1
[cli_1]: aborting job:
application called MPI_Abort(MPI_COMM_WORLD, 73) - process 1
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121213/4342c226/attachment-0001.html>

From gokhalen at gmail.com  Thu Dec 13 12:47:35 2012
From: gokhalen at gmail.com (Nachiket Gokhale)
Date: Thu, 13 Dec 2012 13:47:35 -0500
Subject: [petsc-users] MatCreateComposite question
Message-ID: <CAGBgCJEHZFyiHfGjtaAPrx1SQi9BRkDXQ+pXzs7tv0TB7kfN7w@mail.gmail.com>

Sorry for replying to my own question, but it seems to be working in
optimized mode, but not in Debug mode, which is strange.  -Nachiket
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121213/fbc78880/attachment.html>

From jedbrown at mcs.anl.gov  Thu Dec 13 12:55:47 2012
From: jedbrown at mcs.anl.gov (Jed Brown)
Date: Thu, 13 Dec 2012 10:55:47 -0800
Subject: [petsc-users] MatCreateComposite question
In-Reply-To: <CAGBgCJEHZFyiHfGjtaAPrx1SQi9BRkDXQ+pXzs7tv0TB7kfN7w@mail.gmail.com>
References: <CAGBgCJEHZFyiHfGjtaAPrx1SQi9BRkDXQ+pXzs7tv0TB7kfN7w@mail.gmail.com>
Message-ID: <CAM9tzSmKhpFwYJSKd3tLxCKBY9YudLRTXkJ0OqFmfv+THADxRg@mail.gmail.com>

That's because the check isn't firing in optimized mode.

Barry, should MatCreate_Composite() set mat->preallocated = TRUE because
preallocation is implicit for composite matrices, or should the
user/MatCreateComposite() be responsible for calling MatSetUp()?


On Thu, Dec 13, 2012 at 10:47 AM, Nachiket Gokhale <gokhalen at gmail.com>wrote:

> Sorry for replying to my own question, but it seems to be working in
> optimized mode, but not in Debug mode, which is strange.  -Nachiket
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121213/0db21257/attachment.html>

From bsmith at mcs.anl.gov  Thu Dec 13 13:40:41 2012
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Thu, 13 Dec 2012 13:40:41 -0600
Subject: [petsc-users] MatCreateComposite question
In-Reply-To: <CAM9tzSmKhpFwYJSKd3tLxCKBY9YudLRTXkJ0OqFmfv+THADxRg@mail.gmail.com>
References: <CAGBgCJEHZFyiHfGjtaAPrx1SQi9BRkDXQ+pXzs7tv0TB7kfN7w@mail.gmail.com>
	<CAM9tzSmKhpFwYJSKd3tLxCKBY9YudLRTXkJ0OqFmfv+THADxRg@mail.gmail.com>
Message-ID: <909099E5-7A0D-491B-B2CE-8B9C2C1A6F8A@mcs.anl.gov>


On Dec 13, 2012, at 12:55 PM, Jed Brown <jedbrown at mcs.anl.gov> wrote:

> That's because the check isn't firing in optimized mode.
> 
> Barry, should MatCreate_Composite() set mat->preallocated = TRUE because preallocation is implicit for composite matrices, or should the user/MatCreateComposite() be responsible for calling MatSetUp()?

  I am fine with having it click to mat->preallocated automatically for now.

  Barry

> 
> 
> On Thu, Dec 13, 2012 at 10:47 AM, Nachiket Gokhale <gokhalen at gmail.com> wrote:
> Sorry for replying to my own question, but it seems to be working in optimized mode, but not in Debug mode, which is strange.  -Nachiket
> 


From jedbrown at mcs.anl.gov  Thu Dec 13 13:44:34 2012
From: jedbrown at mcs.anl.gov (Jed Brown)
Date: Thu, 13 Dec 2012 11:44:34 -0800
Subject: [petsc-users] MatCreateComposite question
In-Reply-To: <909099E5-7A0D-491B-B2CE-8B9C2C1A6F8A@mcs.anl.gov>
References: <CAGBgCJEHZFyiHfGjtaAPrx1SQi9BRkDXQ+pXzs7tv0TB7kfN7w@mail.gmail.com>
	<CAM9tzSmKhpFwYJSKd3tLxCKBY9YudLRTXkJ0OqFmfv+THADxRg@mail.gmail.com>
	<909099E5-7A0D-491B-B2CE-8B9C2C1A6F8A@mcs.anl.gov>
Message-ID: <CAM9tzSkCQtjDAdTR_Fhf+0t-i6mnZLrcxitvCw5ZWqRYRK1ssA@mail.gmail.com>

https://bitbucket.org/petsc/petsc-3.3/commits/ceb522f2c6640c2934693f744f823595bb0438fc


On Thu, Dec 13, 2012 at 11:40 AM, Barry Smith <bsmith at mcs.anl.gov> wrote:

>
> On Dec 13, 2012, at 12:55 PM, Jed Brown <jedbrown at mcs.anl.gov> wrote:
>
> > That's because the check isn't firing in optimized mode.
> >
> > Barry, should MatCreate_Composite() set mat->preallocated = TRUE because
> preallocation is implicit for composite matrices, or should the
> user/MatCreateComposite() be responsible for calling MatSetUp()?
>
>   I am fine with having it click to mat->preallocated automatically for
> now.
>
>   Barry
>
> >
> >
> > On Thu, Dec 13, 2012 at 10:47 AM, Nachiket Gokhale <gokhalen at gmail.com>
> wrote:
> > Sorry for replying to my own question, but it seems to be working in
> optimized mode, but not in Debug mode, which is strange.  -Nachiket
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121213/43fabedf/attachment.html>

From gokhalen at gmail.com  Thu Dec 13 14:20:43 2012
From: gokhalen at gmail.com (Nachiket Gokhale)
Date: Thu, 13 Dec 2012 15:20:43 -0500
Subject: [petsc-users] MatCreateComposite question
In-Reply-To: <CAM9tzSkCQtjDAdTR_Fhf+0t-i6mnZLrcxitvCw5ZWqRYRK1ssA@mail.gmail.com>
References: <CAGBgCJEHZFyiHfGjtaAPrx1SQi9BRkDXQ+pXzs7tv0TB7kfN7w@mail.gmail.com>
	<CAM9tzSmKhpFwYJSKd3tLxCKBY9YudLRTXkJ0OqFmfv+THADxRg@mail.gmail.com>
	<909099E5-7A0D-491B-B2CE-8B9C2C1A6F8A@mcs.anl.gov>
	<CAM9tzSkCQtjDAdTR_Fhf+0t-i6mnZLrcxitvCw5ZWqRYRK1ssA@mail.gmail.com>
Message-ID: <CAGBgCJE8a5kRkzc12k8wkvoeXyS4k7DA0q67uonC0etYq6xZ-w@mail.gmail.com>

Thanks, that seems to work.

On Thu, Dec 13, 2012 at 2:44 PM, Jed Brown <jedbrown at mcs.anl.gov> wrote:

>
> https://bitbucket.org/petsc/petsc-3.3/commits/ceb522f2c6640c2934693f744f823595bb0438fc
>
>
>
> On Thu, Dec 13, 2012 at 11:40 AM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
>>
>> On Dec 13, 2012, at 12:55 PM, Jed Brown <jedbrown at mcs.anl.gov> wrote:
>>
>> > That's because the check isn't firing in optimized mode.
>> >
>> > Barry, should MatCreate_Composite() set mat->preallocated = TRUE
>> because preallocation is implicit for composite matrices, or should the
>> user/MatCreateComposite() be responsible for calling MatSetUp()?
>>
>>   I am fine with having it click to mat->preallocated automatically for
>> now.
>>
>>   Barry
>>
>> >
>> >
>> > On Thu, Dec 13, 2012 at 10:47 AM, Nachiket Gokhale <gokhalen at gmail.com>
>> wrote:
>> > Sorry for replying to my own question, but it seems to be working in
>> optimized mode, but not in Debug mode, which is strange.  -Nachiket
>> >
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121213/4128ba9e/attachment.html>

From gokhalen at gmail.com  Thu Dec 13 15:20:37 2012
From: gokhalen at gmail.com (Nachiket Gokhale)
Date: Thu, 13 Dec 2012 16:20:37 -0500
Subject: [petsc-users] MUMPS Stuck
Message-ID: <CAGBgCJF+Xn0ho5_9G=KTu0m0WMS1i+CAyw2Ug1fFiG9PLSjMVA@mail.gmail.com>

I am trying to solve a complex matrix equation which was assembled using
MatCompositeMerge using MUMPS and LU preconditioner. It seems to  me that
the solve is stuck in the factorization phase. It is taking 20 mins or so,
using 16 processes.  A problem of the same size using reals instead of
complex was solved previously in approximately a minute using 4 processes.
Mumps output of *-mat_mumps_icntl_4 1  *at the end of this email.  Does
anyone have any ideas about what the problem maybe ?

Thanks,

-Nachiket
*
*
*
*

Entering ZMUMPS driver with JOB, N, NZ =   1      122370              0

 ZMUMPS 4.10.0
L U Solver for unsymmetric matrices
Type of parallelism: Working host

 ****** ANALYSIS STEP ********

 ** Max-trans not allowed because matrix is distributed
 ... Structural symmetry (in percent)=  100
 Density: NBdense, Average, Median   =    0   42   26
 Ordering based on METIS
 A root of estimated size         2736  has been selected for Scalapack.

Leaving analysis phase with  ...
INFOG(1)                                       =               0
INFOG(2)                                       =               0
 -- (20) Number of entries in factors (estim.) =       563723522
 --  (3) Storage of factors  (REAL, estimated) =       565185337
 --  (4) Storage of factors  (INT , estimated) =         3537003
 --  (5) Maximum frontal size      (estimated) =           15239
 --  (6) Number of nodes in the tree           =            7914
 -- (32) Type of analysis effectively used     =               1
 --  (7) Ordering option effectively used      =               5
ICNTL(6) Maximum transversal option            =               0
ICNTL(7) Pivot order option                    =               7
Percentage of memory relaxation (effective)    =              35
Number of level 2 nodes                        =              35
Number of split nodes                          =               8
RINFOG(1) Operations during elimination (estim)=   4.877D+12
Distributed matrix entry format (ICNTL(18))    =               3
 ** Rank of proc needing largest memory in IC facto        :         0
 ** Estimated corresponding MBYTES for IC facto            :      3661
 ** Estimated avg. MBYTES per work. proc at facto (IC)     :      2018
 ** TOTAL     space in MBYTES for IC factorization         :     32289
 ** Rank of proc needing largest memory for OOC facto      :         0
 ** Estimated corresponding MBYTES for OOC facto           :      3462
 ** Estimated avg. MBYTES per work. proc at facto (OOC)    :      1787
 ** TOTAL     space in MBYTES for OOC factorization        :     28599
Entering ZMUMPS driver with JOB, N, NZ =   2      122370        5211070

 ****** FACTORIZATION STEP ********


 GLOBAL STATISTICS PRIOR NUMERICAL FACTORIZATION ...
 NUMBER OF WORKING PROCESSES              =          16
 OUT-OF-CORE OPTION (ICNTL(22))           =           0
 REAL SPACE FOR FACTORS                   =   565185337
 INTEGER SPACE FOR FACTORS                =     3537003
 MAXIMUM FRONTAL SIZE (ESTIMATED)         =       15239
 NUMBER OF NODES IN THE TREE              =        7914
 Convergence error after scaling for ONE-NORM (option 7/8)   = 0.79D+00
 Maximum effective relaxed size of S              =   199523439
 Average effective relaxed size of S              =    98303057

 REDISTRIB: TOTAL DATA LOCAL/SENT         =      657185    14022665
 GLOBAL TIME FOR MATRIX DISTRIBUTION       =      0.4805
 ** Memory relaxation parameter ( ICNTL(14)  )            :        35
 ** Rank of processor needing largest memory in facto     :         0
 ** Space in MBYTES used by this processor for facto      :      3661
 ** Avg. Space in MBYTES per working proc during facto    :      2018
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121213/d19620ad/attachment.html>

From knepley at gmail.com  Thu Dec 13 15:29:05 2012
From: knepley at gmail.com (Matthew Knepley)
Date: Thu, 13 Dec 2012 13:29:05 -0800
Subject: [petsc-users] MUMPS Stuck
In-Reply-To: <CAGBgCJF+Xn0ho5_9G=KTu0m0WMS1i+CAyw2Ug1fFiG9PLSjMVA@mail.gmail.com>
References: <CAGBgCJF+Xn0ho5_9G=KTu0m0WMS1i+CAyw2Ug1fFiG9PLSjMVA@mail.gmail.com>
Message-ID: <CAMYG4Gk9QysGR6JBz82TmWJnnUkP8wT=_biP-Ft1LEZHfEtpLw@mail.gmail.com>

On Thu, Dec 13, 2012 at 1:20 PM, Nachiket Gokhale <gokhalen at gmail.com> wrote:
> I am trying to solve a complex matrix equation which was assembled using
> MatCompositeMerge using MUMPS and LU preconditioner. It seems to  me that
> the solve is stuck in the factorization phase. It is taking 20 mins or so,
> using 16 processes.  A problem of the same size using reals instead of
> complex was solved previously in approximately a minute using 4 processes.
> Mumps output of -mat_mumps_icntl_4 1  at the end of this email.  Does anyone
> have any ideas about what the problem maybe ?

Complex arithmetic is much more expensive, and you can lose some of
the optimizations
made in the code. I think you have to wait longer than this. Also, you
should try attaching
the debugger to a process to see whether it is computing or waiting.

   Matt

> Thanks,
>
> -Nachiket
>
>
>
> Entering ZMUMPS driver with JOB, N, NZ =   1      122370              0
>
>  ZMUMPS 4.10.0
> L U Solver for unsymmetric matrices
> Type of parallelism: Working host
>
>  ****** ANALYSIS STEP ********
>
>  ** Max-trans not allowed because matrix is distributed
>  ... Structural symmetry (in percent)=  100
>  Density: NBdense, Average, Median   =    0   42   26
>  Ordering based on METIS
>  A root of estimated size         2736  has been selected for Scalapack.
>
> Leaving analysis phase with  ...
> INFOG(1)                                       =               0
> INFOG(2)                                       =               0
>  -- (20) Number of entries in factors (estim.) =       563723522
>  --  (3) Storage of factors  (REAL, estimated) =       565185337
>  --  (4) Storage of factors  (INT , estimated) =         3537003
>  --  (5) Maximum frontal size      (estimated) =           15239
>  --  (6) Number of nodes in the tree           =            7914
>  -- (32) Type of analysis effectively used     =               1
>  --  (7) Ordering option effectively used      =               5
> ICNTL(6) Maximum transversal option            =               0
> ICNTL(7) Pivot order option                    =               7
> Percentage of memory relaxation (effective)    =              35
> Number of level 2 nodes                        =              35
> Number of split nodes                          =               8
> RINFOG(1) Operations during elimination (estim)=   4.877D+12
> Distributed matrix entry format (ICNTL(18))    =               3
>  ** Rank of proc needing largest memory in IC facto        :         0
>  ** Estimated corresponding MBYTES for IC facto            :      3661
>  ** Estimated avg. MBYTES per work. proc at facto (IC)     :      2018
>  ** TOTAL     space in MBYTES for IC factorization         :     32289
>  ** Rank of proc needing largest memory for OOC facto      :         0
>  ** Estimated corresponding MBYTES for OOC facto           :      3462
>  ** Estimated avg. MBYTES per work. proc at facto (OOC)    :      1787
>  ** TOTAL     space in MBYTES for OOC factorization        :     28599
> Entering ZMUMPS driver with JOB, N, NZ =   2      122370        5211070
>
>  ****** FACTORIZATION STEP ********
>
>
>  GLOBAL STATISTICS PRIOR NUMERICAL FACTORIZATION ...
>  NUMBER OF WORKING PROCESSES              =          16
>  OUT-OF-CORE OPTION (ICNTL(22))           =           0
>  REAL SPACE FOR FACTORS                   =   565185337
>  INTEGER SPACE FOR FACTORS                =     3537003
>  MAXIMUM FRONTAL SIZE (ESTIMATED)         =       15239
>  NUMBER OF NODES IN THE TREE              =        7914
>  Convergence error after scaling for ONE-NORM (option 7/8)   = 0.79D+00
>  Maximum effective relaxed size of S              =   199523439
>  Average effective relaxed size of S              =    98303057
>
>  REDISTRIB: TOTAL DATA LOCAL/SENT         =      657185    14022665
>  GLOBAL TIME FOR MATRIX DISTRIBUTION       =      0.4805
>  ** Memory relaxation parameter ( ICNTL(14)  )            :        35
>  ** Rank of processor needing largest memory in facto     :         0
>  ** Space in MBYTES used by this processor for facto      :      3661
>  ** Avg. Space in MBYTES per working proc during facto    :      2018
>



--
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener

From gokhalen at gmail.com  Thu Dec 13 15:44:44 2012
From: gokhalen at gmail.com (Nachiket Gokhale)
Date: Thu, 13 Dec 2012 16:44:44 -0500
Subject: [petsc-users] MUMPS Stuck
In-Reply-To: <CAMYG4Gk9QysGR6JBz82TmWJnnUkP8wT=_biP-Ft1LEZHfEtpLw@mail.gmail.com>
References: <CAGBgCJF+Xn0ho5_9G=KTu0m0WMS1i+CAyw2Ug1fFiG9PLSjMVA@mail.gmail.com>
	<CAMYG4Gk9QysGR6JBz82TmWJnnUkP8wT=_biP-Ft1LEZHfEtpLw@mail.gmail.com>
Message-ID: <CAGBgCJHa+vZk=3F0V-GwE0tboWUKbC3F+ObC42opqaUqVjdF0g@mail.gmail.com>

Thanks - should I attached the debugger in debug mode or in optimized mode?
I suspect it will be tremendously slow in debug mode, otoh I am not sure if
it will yield any useful information in optimized mode.

Also, will -on_error_attach_debugger do the trick?

 -Nachiket

On Thu, Dec 13, 2012 at 4:29 PM, Matthew Knepley <knepley at gmail.com> wrote:

> On Thu, Dec 13, 2012 at 1:20 PM, Nachiket Gokhale <gokhalen at gmail.com>
> wrote:
> > I am trying to solve a complex matrix equation which was assembled using
> > MatCompositeMerge using MUMPS and LU preconditioner. It seems to  me that
> > the solve is stuck in the factorization phase. It is taking 20 mins or
> so,
> > using 16 processes.  A problem of the same size using reals instead of
> > complex was solved previously in approximately a minute using 4
> processes.
> > Mumps output of -mat_mumps_icntl_4 1  at the end of this email.  Does
> anyone
> > have any ideas about what the problem maybe ?
>
> Complex arithmetic is much more expensive, and you can lose some of
> the optimizations
> made in the code. I think you have to wait longer than this. Also, you
> should try attaching
> the debugger to a process to see whether it is computing or waiting.
>
>    Matt
>
> > Thanks,
> >
> > -Nachiket
> >
> >
> >
> > Entering ZMUMPS driver with JOB, N, NZ =   1      122370              0
> >
> >  ZMUMPS 4.10.0
> > L U Solver for unsymmetric matrices
> > Type of parallelism: Working host
> >
> >  ****** ANALYSIS STEP ********
> >
> >  ** Max-trans not allowed because matrix is distributed
> >  ... Structural symmetry (in percent)=  100
> >  Density: NBdense, Average, Median   =    0   42   26
> >  Ordering based on METIS
> >  A root of estimated size         2736  has been selected for Scalapack.
> >
> > Leaving analysis phase with  ...
> > INFOG(1)                                       =               0
> > INFOG(2)                                       =               0
> >  -- (20) Number of entries in factors (estim.) =       563723522
> >  --  (3) Storage of factors  (REAL, estimated) =       565185337
> >  --  (4) Storage of factors  (INT , estimated) =         3537003
> >  --  (5) Maximum frontal size      (estimated) =           15239
> >  --  (6) Number of nodes in the tree           =            7914
> >  -- (32) Type of analysis effectively used     =               1
> >  --  (7) Ordering option effectively used      =               5
> > ICNTL(6) Maximum transversal option            =               0
> > ICNTL(7) Pivot order option                    =               7
> > Percentage of memory relaxation (effective)    =              35
> > Number of level 2 nodes                        =              35
> > Number of split nodes                          =               8
> > RINFOG(1) Operations during elimination (estim)=   4.877D+12
> > Distributed matrix entry format (ICNTL(18))    =               3
> >  ** Rank of proc needing largest memory in IC facto        :         0
> >  ** Estimated corresponding MBYTES for IC facto            :      3661
> >  ** Estimated avg. MBYTES per work. proc at facto (IC)     :      2018
> >  ** TOTAL     space in MBYTES for IC factorization         :     32289
> >  ** Rank of proc needing largest memory for OOC facto      :         0
> >  ** Estimated corresponding MBYTES for OOC facto           :      3462
> >  ** Estimated avg. MBYTES per work. proc at facto (OOC)    :      1787
> >  ** TOTAL     space in MBYTES for OOC factorization        :     28599
> > Entering ZMUMPS driver with JOB, N, NZ =   2      122370        5211070
> >
> >  ****** FACTORIZATION STEP ********
> >
> >
> >  GLOBAL STATISTICS PRIOR NUMERICAL FACTORIZATION ...
> >  NUMBER OF WORKING PROCESSES              =          16
> >  OUT-OF-CORE OPTION (ICNTL(22))           =           0
> >  REAL SPACE FOR FACTORS                   =   565185337
> >  INTEGER SPACE FOR FACTORS                =     3537003
> >  MAXIMUM FRONTAL SIZE (ESTIMATED)         =       15239
> >  NUMBER OF NODES IN THE TREE              =        7914
> >  Convergence error after scaling for ONE-NORM (option 7/8)   = 0.79D+00
> >  Maximum effective relaxed size of S              =   199523439
> >  Average effective relaxed size of S              =    98303057
> >
> >  REDISTRIB: TOTAL DATA LOCAL/SENT         =      657185    14022665
> >  GLOBAL TIME FOR MATRIX DISTRIBUTION       =      0.4805
> >  ** Memory relaxation parameter ( ICNTL(14)  )            :        35
> >  ** Rank of processor needing largest memory in facto     :         0
> >  ** Space in MBYTES used by this processor for facto      :      3661
> >  ** Avg. Space in MBYTES per working proc during facto    :      2018
> >
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which
> their experiments lead.
> -- Norbert Wiener
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121213/4acb2b04/attachment.html>

From knepley at gmail.com  Thu Dec 13 16:19:32 2012
From: knepley at gmail.com (Matthew Knepley)
Date: Thu, 13 Dec 2012 14:19:32 -0800
Subject: [petsc-users] MUMPS Stuck
In-Reply-To: <CAGBgCJHa+vZk=3F0V-GwE0tboWUKbC3F+ObC42opqaUqVjdF0g@mail.gmail.com>
References: <CAGBgCJF+Xn0ho5_9G=KTu0m0WMS1i+CAyw2Ug1fFiG9PLSjMVA@mail.gmail.com>
	<CAMYG4Gk9QysGR6JBz82TmWJnnUkP8wT=_biP-Ft1LEZHfEtpLw@mail.gmail.com>
	<CAGBgCJHa+vZk=3F0V-GwE0tboWUKbC3F+ObC42opqaUqVjdF0g@mail.gmail.com>
Message-ID: <CAMYG4GkrZLLXQ5Z5QMg=bwRvKDhU5OMci6KZaiUejK15Gt-tXw@mail.gmail.com>

On Thu, Dec 13, 2012 at 1:44 PM, Nachiket Gokhale <gokhalen at gmail.com> wrote:
> Thanks - should I attached the debugger in debug mode or in optimized mode?
> I suspect it will be tremendously slow in debug mode, otoh I am not sure if
> it will yield any useful information in optimized mode.

Optimized will still give a stack trace.

> Also, will -on_error_attach_debugger do the trick?

No, either spawn one -start_in_debugger -debugger_nodes 0, or attach
using gdb -p <proc id>

   Matt

>  -Nachiket
>
> On Thu, Dec 13, 2012 at 4:29 PM, Matthew Knepley <knepley at gmail.com> wrote:
>>
>> On Thu, Dec 13, 2012 at 1:20 PM, Nachiket Gokhale <gokhalen at gmail.com>
>> wrote:
>> > I am trying to solve a complex matrix equation which was assembled using
>> > MatCompositeMerge using MUMPS and LU preconditioner. It seems to  me
>> > that
>> > the solve is stuck in the factorization phase. It is taking 20 mins or
>> > so,
>> > using 16 processes.  A problem of the same size using reals instead of
>> > complex was solved previously in approximately a minute using 4
>> > processes.
>> > Mumps output of -mat_mumps_icntl_4 1  at the end of this email.  Does
>> > anyone
>> > have any ideas about what the problem maybe ?
>>
>> Complex arithmetic is much more expensive, and you can lose some of
>> the optimizations
>> made in the code. I think you have to wait longer than this. Also, you
>> should try attaching
>> the debugger to a process to see whether it is computing or waiting.
>>
>>    Matt
>>
>> > Thanks,
>> >
>> > -Nachiket
>> >
>> >
>> >
>> > Entering ZMUMPS driver with JOB, N, NZ =   1      122370              0
>> >
>> >  ZMUMPS 4.10.0
>> > L U Solver for unsymmetric matrices
>> > Type of parallelism: Working host
>> >
>> >  ****** ANALYSIS STEP ********
>> >
>> >  ** Max-trans not allowed because matrix is distributed
>> >  ... Structural symmetry (in percent)=  100
>> >  Density: NBdense, Average, Median   =    0   42   26
>> >  Ordering based on METIS
>> >  A root of estimated size         2736  has been selected for Scalapack.
>> >
>> > Leaving analysis phase with  ...
>> > INFOG(1)                                       =               0
>> > INFOG(2)                                       =               0
>> >  -- (20) Number of entries in factors (estim.) =       563723522
>> >  --  (3) Storage of factors  (REAL, estimated) =       565185337
>> >  --  (4) Storage of factors  (INT , estimated) =         3537003
>> >  --  (5) Maximum frontal size      (estimated) =           15239
>> >  --  (6) Number of nodes in the tree           =            7914
>> >  -- (32) Type of analysis effectively used     =               1
>> >  --  (7) Ordering option effectively used      =               5
>> > ICNTL(6) Maximum transversal option            =               0
>> > ICNTL(7) Pivot order option                    =               7
>> > Percentage of memory relaxation (effective)    =              35
>> > Number of level 2 nodes                        =              35
>> > Number of split nodes                          =               8
>> > RINFOG(1) Operations during elimination (estim)=   4.877D+12
>> > Distributed matrix entry format (ICNTL(18))    =               3
>> >  ** Rank of proc needing largest memory in IC facto        :         0
>> >  ** Estimated corresponding MBYTES for IC facto            :      3661
>> >  ** Estimated avg. MBYTES per work. proc at facto (IC)     :      2018
>> >  ** TOTAL     space in MBYTES for IC factorization         :     32289
>> >  ** Rank of proc needing largest memory for OOC facto      :         0
>> >  ** Estimated corresponding MBYTES for OOC facto           :      3462
>> >  ** Estimated avg. MBYTES per work. proc at facto (OOC)    :      1787
>> >  ** TOTAL     space in MBYTES for OOC factorization        :     28599
>> > Entering ZMUMPS driver with JOB, N, NZ =   2      122370        5211070
>> >
>> >  ****** FACTORIZATION STEP ********
>> >
>> >
>> >  GLOBAL STATISTICS PRIOR NUMERICAL FACTORIZATION ...
>> >  NUMBER OF WORKING PROCESSES              =          16
>> >  OUT-OF-CORE OPTION (ICNTL(22))           =           0
>> >  REAL SPACE FOR FACTORS                   =   565185337
>> >  INTEGER SPACE FOR FACTORS                =     3537003
>> >  MAXIMUM FRONTAL SIZE (ESTIMATED)         =       15239
>> >  NUMBER OF NODES IN THE TREE              =        7914
>> >  Convergence error after scaling for ONE-NORM (option 7/8)   = 0.79D+00
>> >  Maximum effective relaxed size of S              =   199523439
>> >  Average effective relaxed size of S              =    98303057
>> >
>> >  REDISTRIB: TOTAL DATA LOCAL/SENT         =      657185    14022665
>> >  GLOBAL TIME FOR MATRIX DISTRIBUTION       =      0.4805
>> >  ** Memory relaxation parameter ( ICNTL(14)  )            :        35
>> >  ** Rank of processor needing largest memory in facto     :         0
>> >  ** Space in MBYTES used by this processor for facto      :      3661
>> >  ** Avg. Space in MBYTES per working proc during facto    :      2018
>> >
>>
>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which
>> their experiments lead.
>> -- Norbert Wiener
>
>



--
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener

From gokhalen at gmail.com  Thu Dec 13 17:03:13 2012
From: gokhalen at gmail.com (Nachiket Gokhale)
Date: Thu, 13 Dec 2012 18:03:13 -0500
Subject: [petsc-users] MUMPS Stuck
In-Reply-To: <CAMYG4GkrZLLXQ5Z5QMg=bwRvKDhU5OMci6KZaiUejK15Gt-tXw@mail.gmail.com>
References: <CAGBgCJF+Xn0ho5_9G=KTu0m0WMS1i+CAyw2Ug1fFiG9PLSjMVA@mail.gmail.com>
	<CAMYG4Gk9QysGR6JBz82TmWJnnUkP8wT=_biP-Ft1LEZHfEtpLw@mail.gmail.com>
	<CAGBgCJHa+vZk=3F0V-GwE0tboWUKbC3F+ObC42opqaUqVjdF0g@mail.gmail.com>
	<CAMYG4GkrZLLXQ5Z5QMg=bwRvKDhU5OMci6KZaiUejK15Gt-tXw@mail.gmail.com>
Message-ID: <CAGBgCJFWyqUB=zDvZDTXe5DpohDWzJwTiOMDL5C-=Xx9TbjqAQ@mail.gmail.com>

The factorizations seem to be going through. It seem to take 40 mins or so
per factorization.

 -Nachiket

On Thu, Dec 13, 2012 at 5:19 PM, Matthew Knepley <knepley at gmail.com> wrote:

> On Thu, Dec 13, 2012 at 1:44 PM, Nachiket Gokhale <gokhalen at gmail.com>
> wrote:
> > Thanks - should I attached the debugger in debug mode or in optimized
> mode?
> > I suspect it will be tremendously slow in debug mode, otoh I am not sure
> if
> > it will yield any useful information in optimized mode.
>
> Optimized will still give a stack trace.
>
> > Also, will -on_error_attach_debugger do the trick?
>
> No, either spawn one -start_in_debugger -debugger_nodes 0, or attach
> using gdb -p <proc id>
>
>    Matt
>
> >  -Nachiket
> >
> > On Thu, Dec 13, 2012 at 4:29 PM, Matthew Knepley <knepley at gmail.com>
> wrote:
> >>
> >> On Thu, Dec 13, 2012 at 1:20 PM, Nachiket Gokhale <gokhalen at gmail.com>
> >> wrote:
> >> > I am trying to solve a complex matrix equation which was assembled
> using
> >> > MatCompositeMerge using MUMPS and LU preconditioner. It seems to  me
> >> > that
> >> > the solve is stuck in the factorization phase. It is taking 20 mins or
> >> > so,
> >> > using 16 processes.  A problem of the same size using reals instead of
> >> > complex was solved previously in approximately a minute using 4
> >> > processes.
> >> > Mumps output of -mat_mumps_icntl_4 1  at the end of this email.  Does
> >> > anyone
> >> > have any ideas about what the problem maybe ?
> >>
> >> Complex arithmetic is much more expensive, and you can lose some of
> >> the optimizations
> >> made in the code. I think you have to wait longer than this. Also, you
> >> should try attaching
> >> the debugger to a process to see whether it is computing or waiting.
> >>
> >>    Matt
> >>
> >> > Thanks,
> >> >
> >> > -Nachiket
> >> >
> >> >
> >> >
> >> > Entering ZMUMPS driver with JOB, N, NZ =   1      122370
>  0
> >> >
> >> >  ZMUMPS 4.10.0
> >> > L U Solver for unsymmetric matrices
> >> > Type of parallelism: Working host
> >> >
> >> >  ****** ANALYSIS STEP ********
> >> >
> >> >  ** Max-trans not allowed because matrix is distributed
> >> >  ... Structural symmetry (in percent)=  100
> >> >  Density: NBdense, Average, Median   =    0   42   26
> >> >  Ordering based on METIS
> >> >  A root of estimated size         2736  has been selected for
> Scalapack.
> >> >
> >> > Leaving analysis phase with  ...
> >> > INFOG(1)                                       =               0
> >> > INFOG(2)                                       =               0
> >> >  -- (20) Number of entries in factors (estim.) =       563723522
> >> >  --  (3) Storage of factors  (REAL, estimated) =       565185337
> >> >  --  (4) Storage of factors  (INT , estimated) =         3537003
> >> >  --  (5) Maximum frontal size      (estimated) =           15239
> >> >  --  (6) Number of nodes in the tree           =            7914
> >> >  -- (32) Type of analysis effectively used     =               1
> >> >  --  (7) Ordering option effectively used      =               5
> >> > ICNTL(6) Maximum transversal option            =               0
> >> > ICNTL(7) Pivot order option                    =               7
> >> > Percentage of memory relaxation (effective)    =              35
> >> > Number of level 2 nodes                        =              35
> >> > Number of split nodes                          =               8
> >> > RINFOG(1) Operations during elimination (estim)=   4.877D+12
> >> > Distributed matrix entry format (ICNTL(18))    =               3
> >> >  ** Rank of proc needing largest memory in IC facto        :         0
> >> >  ** Estimated corresponding MBYTES for IC facto            :      3661
> >> >  ** Estimated avg. MBYTES per work. proc at facto (IC)     :      2018
> >> >  ** TOTAL     space in MBYTES for IC factorization         :     32289
> >> >  ** Rank of proc needing largest memory for OOC facto      :         0
> >> >  ** Estimated corresponding MBYTES for OOC facto           :      3462
> >> >  ** Estimated avg. MBYTES per work. proc at facto (OOC)    :      1787
> >> >  ** TOTAL     space in MBYTES for OOC factorization        :     28599
> >> > Entering ZMUMPS driver with JOB, N, NZ =   2      122370
>  5211070
> >> >
> >> >  ****** FACTORIZATION STEP ********
> >> >
> >> >
> >> >  GLOBAL STATISTICS PRIOR NUMERICAL FACTORIZATION ...
> >> >  NUMBER OF WORKING PROCESSES              =          16
> >> >  OUT-OF-CORE OPTION (ICNTL(22))           =           0
> >> >  REAL SPACE FOR FACTORS                   =   565185337
> >> >  INTEGER SPACE FOR FACTORS                =     3537003
> >> >  MAXIMUM FRONTAL SIZE (ESTIMATED)         =       15239
> >> >  NUMBER OF NODES IN THE TREE              =        7914
> >> >  Convergence error after scaling for ONE-NORM (option 7/8)   =
> 0.79D+00
> >> >  Maximum effective relaxed size of S              =   199523439
> >> >  Average effective relaxed size of S              =    98303057
> >> >
> >> >  REDISTRIB: TOTAL DATA LOCAL/SENT         =      657185    14022665
> >> >  GLOBAL TIME FOR MATRIX DISTRIBUTION       =      0.4805
> >> >  ** Memory relaxation parameter ( ICNTL(14)  )            :        35
> >> >  ** Rank of processor needing largest memory in facto     :         0
> >> >  ** Space in MBYTES used by this processor for facto      :      3661
> >> >  ** Avg. Space in MBYTES per working proc during facto    :      2018
> >> >
> >>
> >>
> >>
> >> --
> >> What most experimenters take for granted before they begin their
> >> experiments is infinitely more interesting than any results to which
> >> their experiments lead.
> >> -- Norbert Wiener
> >
> >
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which
> their experiments lead.
> -- Norbert Wiener
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121213/31babfcc/attachment-0001.html>

From malexe at vt.edu  Fri Dec 14 04:59:56 2012
From: malexe at vt.edu (Mihai Alexe)
Date: Fri, 14 Dec 2012 11:59:56 +0100
Subject: [petsc-users] Is MatSetUp required with MatCreateNormal and
	MatCreateMPIAIJWithSplitArrays?
In-Reply-To: <269A4421-4AB1-4829-A578-5B66CEF5C6C9@mcs.anl.gov>
References: <CAObMzT0XdWBLEvbPx9z_8hxfVJ8+Wn_wuO0sGKAQgK5eNS20WQ@mail.gmail.com>
	<269A4421-4AB1-4829-A578-5B66CEF5C6C9@mcs.anl.gov>
Message-ID: <CAObMzT1tmpA+zNv-b-_93LsHnPSwU_xgUfHoHAU2KfKgu5VQtQ@mail.gmail.com>

Barry,

I've tracked down the problem.

I ran with -info -mat_view_info, and fpe's enabled and got a SIGFPE after
entering MatCreateMPIAIJWithSplitArrays (Petsc did not produce a stacktrace
unfortunately). This was due to a floating point exception in a typecast
inside mat/interface/matrix.c:

      if (mat->ops->getinfo) {
        MatInfo info;
        ierr = MatGetInfo(mat,MAT_GLOBAL_SUM,&info);CHKERRQ(ierr);
        ierr = PetscViewerASCIIPrintf(viewer,"*total: nonzeros=%D*,
allocated nonzeros=%D\n",*(PetscInt)info.nz_used*
,(PetscInt)info.nz_allocated);CHKERRQ(ierr);
        ierr = PetscViewerASCIIPrintf(viewer,"total number of mallocs used
during MatSetValues calls =%D\n",(PetscInt)info.mallocs);CHKERRQ(ierr);
      }

My sparse matrix has about 6 billion nonzeros. When I disable FPEs, i get a
silent overflow when converting MatInfo.nz_used from PetscLogDouble to
(32-bit) PetscInt:

Matrix Object: 96 MPI processes
  type: mpiaij
  rows=131857963, cols=18752388
  total: *nonzeros=-2147483648*, allocated nonzeros=0

and the code runs just fine. Maybe PETSc should cast nz_used to a long int?


Mihai
On Thu, Nov 29, 2012 at 6:25 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:

>
> On Nov 29, 2012, at 9:48 AM, Mihai Alexe <malexe at vt.edu> wrote:
>
> > Hello all,
> >
> > I am creating a large rectangular MPIAIJ matrix, then a shell
> NormalMatrix that eventually gets passed to a KSP object (all part of a
> constrained least-squares solver).
> > Code looks as follows:
> >
> >  //user.A_mat and user.Hess are PETSc Mat
> >
> >  info = MatCreateMPIAIJWithSplitArrays( PETSC_COMM_WORLD, *locrow,
> *loccol, nrow,
> >                                 *ncol, onrowidx, oncolidx,
> >                                 (PetscScalar*) onvals, offrowidx,
> offcolidx,
> >                                 (PetscScalar*) values, &user.A_mat );
> CHKERRQ(info);
> >
> >  info = MatCreateNormal( user.A_mat, &user.Hess ); CHKERRQ(info);
> >  info = MatSetUp( user.Hess );
> >
> > Is MatSetUp() required for A or Hess to be initialized correctly? Or
> some call to MatSetPreallocation?
> '
>    No you shouldn't need them.   Try with valgrind
> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
>
>   Barry
>
> >
> > My code crashes after displaying (with -info -mat_view_info):
> >
> > [0] PetscCommDuplicate(): Duplicating a communicator 47534399113024
> 67425648 max tags = 2147483647
> > [0] PetscCommDuplicate(): Duplicating a communicator 47534399112000
> 67760592 max tags = 2147483647
> > [0] MatCreate_SeqAIJ_Inode(): Not using Inode routines due to
> -mat_no_inode
> > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 8920860 X 1508490; storage
> space: 0 unneeded,34572269 used
> > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 615
> > Matrix Object: 1 MPI processes
> >   type: seqaij
> >   rows=8920860, cols=1508490
> >   total: nonzeros=34572269, allocated nonzeros=0
> >   total number of mallocs used during MatSetValues calls =0
> >     not using I-node routines
> > [0] PetscCommDuplicate(): Using internal PETSc communicator
> 47534399112000 67760592
> > [0] MatCreate_SeqAIJ_Inode(): Not using Inode routines due to
> -mat_no_inode
> > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 8920860 X 18752388; storage
> space: 0 unneeded,1762711 used
> > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 349
> > Matrix Object: 1 MPI processes
> >   type: seqaij
> >   rows=8920860, cols=18752388
> >   total: nonzeros=1762711, allocated nonzeros=0
> >   total number of mallocs used during MatSetValues calls =0
> >     not using I-node routines
> > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 8920860 X 1508490; storage
> space: 0 unneeded,34572269 used
> > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 615
> > Matrix Object: 1 MPI processes
> >   type: seqaij
> >   rows=8920860, cols=1508490
> >   total: nonzeros=34572269, allocated nonzeros=0
> >   total number of mallocs used during MatSetValues calls =0
> >     not using I-node routines
> > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 8920860 X 18752388; storage
> space: 0 unneeded,1762711 used
> > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 349
> > Matrix Object: 1 MPI processes
> >   type: seqaij
> >   rows=8920860, cols=18752388
> >   total: nonzeros=1762711, allocated nonzeros=0
> >   total number of mallocs used during MatSetValues calls =0
> >     not using I-node routines
> > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 8920860 X 1508490; storage
> space: 0 unneeded,34572269 used
> > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 615
> > [0] PetscCommDuplicate(): Using internal PETSc communicator
> 47534399112000 67760592
> > [0] PetscCommDuplicate(): Using internal PETSc communicator
> 47534399112000 67760592
> > [0] VecScatterCreateCommon_PtoS(): Using blocksize 1 scatter
> > [0] VecScatterCreate(): General case: MPI to Seq
> > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 8920860 X 38109; storage
> space: 0 unneeded,1762711 used
> > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 349
> > Matrix Object: 160 MPI processes
> >   type: mpiaij
> >   rows=131858910, cols=18752388
> >
> > The code ran just fine on a smaller (pruned) input dataset.
> > I don't get a stacktrace unfortunately... (running in production mode,
> trying to switch to debug mode now).
> >
> >
> > Regards,
> > Mihai
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121214/195d6b5a/attachment.html>

From bsmith at mcs.anl.gov  Fri Dec 14 07:28:34 2012
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Fri, 14 Dec 2012 07:28:34 -0600
Subject: [petsc-users] Is MatSetUp required with MatCreateNormal and
	MatCreateMPIAIJWithSplitArrays?
In-Reply-To: <CAObMzT1tmpA+zNv-b-_93LsHnPSwU_xgUfHoHAU2KfKgu5VQtQ@mail.gmail.com>
References: <CAObMzT0XdWBLEvbPx9z_8hxfVJ8+Wn_wuO0sGKAQgK5eNS20WQ@mail.gmail.com>
	<269A4421-4AB1-4829-A578-5B66CEF5C6C9@mcs.anl.gov>
	<CAObMzT1tmpA+zNv-b-_93LsHnPSwU_xgUfHoHAU2KfKgu5VQtQ@mail.gmail.com>
Message-ID: <3647C5B0-A0C4-4BB9-8017-371BA2AB8B68@mcs.anl.gov>


   Mihai,

    Thanks for tracking down the problem. As a side note, you are getting close to using all of the space in int in your matrix row/column sizes, when you matrix sizes are great than 2^{31}-1 you will need to configure PETSc with --with-64-bit-indices to have PETSc use long long int for PetscInt.

   Satish,

     Could you please patch 3.3 and replace the use of %D with %lld and replace the (PetscInt) casts with (long long int) casts in the two lines
      ierr = PetscViewerASCIIPrintf(viewer,"total: nonzeros=%D, allocated nonzeros=%D\n",(PetscInt)info.nz_used,(PetscInt)info.nz_allocated);CHKERRQ(ierr);
        ierr = PetscViewerASCIIPrintf(viewer,"total number of mallocs used during MatSetValues calls =%D\n",(PetscInt)info.mallocs);CHKERRQ(ierr);

  Thanks

   Barry

On Dec 14, 2012, at 4:59 AM, Mihai Alexe <malexe at vt.edu> wrote:

> Barry,
> 
> I've tracked down the problem.
> 
> I ran with -info -mat_view_info, and fpe's enabled and got a SIGFPE after entering MatCreateMPIAIJWithSplitArrays (Petsc did not produce a stacktrace unfortunately). This was due to a floating point exception in a typecast inside mat/interface/matrix.c:
> 
>       if (mat->ops->getinfo) {
>         MatInfo info;
>         ierr = MatGetInfo(mat,MAT_GLOBAL_SUM,&info);CHKERRQ(ierr);
>         ierr = PetscViewerASCIIPrintf(viewer,"total: nonzeros=%D, allocated nonzeros=%D\n",(PetscInt)info.nz_used,(PetscInt)info.nz_allocated);CHKERRQ(ierr);
>         ierr = PetscViewerASCIIPrintf(viewer,"total number of mallocs used during MatSetValues calls =%D\n",(PetscInt)info.mallocs);CHKERRQ(ierr);
>       }
> 
> My sparse matrix has about 6 billion nonzeros. When I disable FPEs, i get a silent overflow when converting MatInfo.nz_used from PetscLogDouble to (32-bit) PetscInt:
> 
> Matrix Object: 96 MPI processes
>   type: mpiaij
>   rows=131857963, cols=18752388
>   total: nonzeros=-2147483648, allocated nonzeros=0 
> 
> and the code runs just fine. Maybe PETSc should cast nz_used to a long int?
> 
> 
> Mihai
> On Thu, Nov 29, 2012 at 6:25 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> 
> On Nov 29, 2012, at 9:48 AM, Mihai Alexe <malexe at vt.edu> wrote:
> 
> > Hello all,
> >
> > I am creating a large rectangular MPIAIJ matrix, then a shell NormalMatrix that eventually gets passed to a KSP object (all part of a constrained least-squares solver).
> > Code looks as follows:
> >
> >  //user.A_mat and user.Hess are PETSc Mat
> >
> >  info = MatCreateMPIAIJWithSplitArrays( PETSC_COMM_WORLD, *locrow, *loccol, nrow,
> >                                 *ncol, onrowidx, oncolidx,
> >                                 (PetscScalar*) onvals, offrowidx, offcolidx,
> >                                 (PetscScalar*) values, &user.A_mat ); CHKERRQ(info);
> >
> >  info = MatCreateNormal( user.A_mat, &user.Hess ); CHKERRQ(info);
> >  info = MatSetUp( user.Hess );
> >
> > Is MatSetUp() required for A or Hess to be initialized correctly? Or some call to MatSetPreallocation?
> '
>    No you shouldn't need them.   Try with valgrind   http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
> 
>   Barry
> 
> >
> > My code crashes after displaying (with -info -mat_view_info):
> >
> > [0] PetscCommDuplicate(): Duplicating a communicator 47534399113024 67425648 max tags = 2147483647
> > [0] PetscCommDuplicate(): Duplicating a communicator 47534399112000 67760592 max tags = 2147483647
> > [0] MatCreate_SeqAIJ_Inode(): Not using Inode routines due to -mat_no_inode
> > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 8920860 X 1508490; storage space: 0 unneeded,34572269 used
> > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 615
> > Matrix Object: 1 MPI processes
> >   type: seqaij
> >   rows=8920860, cols=1508490
> >   total: nonzeros=34572269, allocated nonzeros=0
> >   total number of mallocs used during MatSetValues calls =0
> >     not using I-node routines
> > [0] PetscCommDuplicate(): Using internal PETSc communicator 47534399112000 67760592
> > [0] MatCreate_SeqAIJ_Inode(): Not using Inode routines due to -mat_no_inode
> > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 8920860 X 18752388; storage space: 0 unneeded,1762711 used
> > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 349
> > Matrix Object: 1 MPI processes
> >   type: seqaij
> >   rows=8920860, cols=18752388
> >   total: nonzeros=1762711, allocated nonzeros=0
> >   total number of mallocs used during MatSetValues calls =0
> >     not using I-node routines
> > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 8920860 X 1508490; storage space: 0 unneeded,34572269 used
> > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 615
> > Matrix Object: 1 MPI processes
> >   type: seqaij
> >   rows=8920860, cols=1508490
> >   total: nonzeros=34572269, allocated nonzeros=0
> >   total number of mallocs used during MatSetValues calls =0
> >     not using I-node routines
> > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 8920860 X 18752388; storage space: 0 unneeded,1762711 used
> > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 349
> > Matrix Object: 1 MPI processes
> >   type: seqaij
> >   rows=8920860, cols=18752388
> >   total: nonzeros=1762711, allocated nonzeros=0
> >   total number of mallocs used during MatSetValues calls =0
> >     not using I-node routines
> > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 8920860 X 1508490; storage space: 0 unneeded,34572269 used
> > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 615
> > [0] PetscCommDuplicate(): Using internal PETSc communicator 47534399112000 67760592
> > [0] PetscCommDuplicate(): Using internal PETSc communicator 47534399112000 67760592
> > [0] VecScatterCreateCommon_PtoS(): Using blocksize 1 scatter
> > [0] VecScatterCreate(): General case: MPI to Seq
> > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 8920860 X 38109; storage space: 0 unneeded,1762711 used
> > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 349
> > Matrix Object: 160 MPI processes
> >   type: mpiaij
> >   rows=131858910, cols=18752388
> >
> > The code ran just fine on a smaller (pruned) input dataset.
> > I don't get a stacktrace unfortunately... (running in production mode, trying to switch to debug mode now).
> >
> >
> > Regards,
> > Mihai
> >
> 
> 


From malexe at vt.edu  Fri Dec 14 07:39:35 2012
From: malexe at vt.edu (Mihai Alexe)
Date: Fri, 14 Dec 2012 14:39:35 +0100
Subject: [petsc-users] Is MatSetUp required with MatCreateNormal and
	MatCreateMPIAIJWithSplitArrays?
In-Reply-To: <3647C5B0-A0C4-4BB9-8017-371BA2AB8B68@mcs.anl.gov>
References: <CAObMzT0XdWBLEvbPx9z_8hxfVJ8+Wn_wuO0sGKAQgK5eNS20WQ@mail.gmail.com>
	<269A4421-4AB1-4829-A578-5B66CEF5C6C9@mcs.anl.gov>
	<CAObMzT1tmpA+zNv-b-_93LsHnPSwU_xgUfHoHAU2KfKgu5VQtQ@mail.gmail.com>
	<3647C5B0-A0C4-4BB9-8017-371BA2AB8B68@mcs.anl.gov>
Message-ID: <CAObMzT2Qv8tW3C8Db5ob76s=KnMF7LKv=N=37Nh-JuqgtdRq3w@mail.gmail.com>

Barry,

Indeed.

As a side remark, the number of unknowns for my least-squares problem is
well within the maximum 32-bit integer limit. That's why I did not
immediately think that 32-bit ints may cause a problem. It's only the
matrix nonzero count that goes over that bound. Quick overview of my "A":

 mglb (rows) =131857963, nglb (cols) =18752388, nnz_glb (nonzeros) =
5812947924

Going to 64-bit integers is not really an option. Long story short, I am
working in single-precision mode, and the PETSc code is called from a
Fortran kernel where we have imposed an EQUIVALENCE between single
precision floats and ints (legacy design...)

Best,
Mihai

On Fri, Dec 14, 2012 at 2:28 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:

>
>    Mihai,
>
>     Thanks for tracking down the problem. As a side note, you are getting
> close to using all of the space in int in your matrix row/column sizes,
> when you matrix sizes are great than 2^{31}-1 you will need to configure
> PETSc with --with-64-bit-indices to have PETSc use long long int for
> PetscInt.
>
>    Satish,
>
>      Could you please patch 3.3 and replace the use of %D with %lld and
> replace the (PetscInt) casts with (long long int) casts in the two lines
>       ierr = PetscViewerASCIIPrintf(viewer,"total: nonzeros=%D, allocated
> nonzeros=%D\n",(PetscInt)info.nz_used,(PetscInt)info.nz_allocated);CHKERRQ(ierr);
>         ierr = PetscViewerASCIIPrintf(viewer,"total number of mallocs used
> during MatSetValues calls =%D\n",(PetscInt)info.mallocs);CHKERRQ(ierr);
>
>   Thanks
>
>    Barry
>
> On Dec 14, 2012, at 4:59 AM, Mihai Alexe <malexe at vt.edu> wrote:
>
> > Barry,
> >
> > I've tracked down the problem.
> >
> > I ran with -info -mat_view_info, and fpe's enabled and got a SIGFPE
> after entering MatCreateMPIAIJWithSplitArrays (Petsc did not produce a
> stacktrace unfortunately). This was due to a floating point exception in a
> typecast inside mat/interface/matrix.c:
> >
> >       if (mat->ops->getinfo) {
> >         MatInfo info;
> >         ierr = MatGetInfo(mat,MAT_GLOBAL_SUM,&info);CHKERRQ(ierr);
> >         ierr = PetscViewerASCIIPrintf(viewer,"total: nonzeros=%D,
> allocated
> nonzeros=%D\n",(PetscInt)info.nz_used,(PetscInt)info.nz_allocated);CHKERRQ(ierr);
> >         ierr = PetscViewerASCIIPrintf(viewer,"total number of mallocs
> used during MatSetValues calls =%D\n",(PetscInt)info.mallocs);CHKERRQ(ierr);
> >       }
> >
> > My sparse matrix has about 6 billion nonzeros. When I disable FPEs, i
> get a silent overflow when converting MatInfo.nz_used from PetscLogDouble
> to (32-bit) PetscInt:
> >
> > Matrix Object: 96 MPI processes
> >   type: mpiaij
> >   rows=131857963, cols=18752388
> >   total: nonzeros=-2147483648, allocated nonzeros=0
> >
> > and the code runs just fine. Maybe PETSc should cast nz_used to a long
> int?
> >
> >
> > Mihai
> > On Thu, Nov 29, 2012 at 6:25 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> >
> > On Nov 29, 2012, at 9:48 AM, Mihai Alexe <malexe at vt.edu> wrote:
> >
> > > Hello all,
> > >
> > > I am creating a large rectangular MPIAIJ matrix, then a shell
> NormalMatrix that eventually gets passed to a KSP object (all part of a
> constrained least-squares solver).
> > > Code looks as follows:
> > >
> > >  //user.A_mat and user.Hess are PETSc Mat
> > >
> > >  info = MatCreateMPIAIJWithSplitArrays( PETSC_COMM_WORLD, *locrow,
> *loccol, nrow,
> > >                                 *ncol, onrowidx, oncolidx,
> > >                                 (PetscScalar*) onvals, offrowidx,
> offcolidx,
> > >                                 (PetscScalar*) values, &user.A_mat );
> CHKERRQ(info);
> > >
> > >  info = MatCreateNormal( user.A_mat, &user.Hess ); CHKERRQ(info);
> > >  info = MatSetUp( user.Hess );
> > >
> > > Is MatSetUp() required for A or Hess to be initialized correctly? Or
> some call to MatSetPreallocation?
> > '
> >    No you shouldn't need them.   Try with valgrind
> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
> >
> >   Barry
> >
> > >
> > > My code crashes after displaying (with -info -mat_view_info):
> > >
> > > [0] PetscCommDuplicate(): Duplicating a communicator 47534399113024
> 67425648 max tags = 2147483647
> > > [0] PetscCommDuplicate(): Duplicating a communicator 47534399112000
> 67760592 max tags = 2147483647
> > > [0] MatCreate_SeqAIJ_Inode(): Not using Inode routines due to
> -mat_no_inode
> > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 8920860 X 1508490; storage
> space: 0 unneeded,34572269 used
> > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
> is 0
> > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 615
> > > Matrix Object: 1 MPI processes
> > >   type: seqaij
> > >   rows=8920860, cols=1508490
> > >   total: nonzeros=34572269, allocated nonzeros=0
> > >   total number of mallocs used during MatSetValues calls =0
> > >     not using I-node routines
> > > [0] PetscCommDuplicate(): Using internal PETSc communicator
> 47534399112000 67760592
> > > [0] MatCreate_SeqAIJ_Inode(): Not using Inode routines due to
> -mat_no_inode
> > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 8920860 X 18752388; storage
> space: 0 unneeded,1762711 used
> > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
> is 0
> > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 349
> > > Matrix Object: 1 MPI processes
> > >   type: seqaij
> > >   rows=8920860, cols=18752388
> > >   total: nonzeros=1762711, allocated nonzeros=0
> > >   total number of mallocs used during MatSetValues calls =0
> > >     not using I-node routines
> > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 8920860 X 1508490; storage
> space: 0 unneeded,34572269 used
> > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
> is 0
> > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 615
> > > Matrix Object: 1 MPI processes
> > >   type: seqaij
> > >   rows=8920860, cols=1508490
> > >   total: nonzeros=34572269, allocated nonzeros=0
> > >   total number of mallocs used during MatSetValues calls =0
> > >     not using I-node routines
> > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 8920860 X 18752388; storage
> space: 0 unneeded,1762711 used
> > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
> is 0
> > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 349
> > > Matrix Object: 1 MPI processes
> > >   type: seqaij
> > >   rows=8920860, cols=18752388
> > >   total: nonzeros=1762711, allocated nonzeros=0
> > >   total number of mallocs used during MatSetValues calls =0
> > >     not using I-node routines
> > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 8920860 X 1508490; storage
> space: 0 unneeded,34572269 used
> > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
> is 0
> > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 615
> > > [0] PetscCommDuplicate(): Using internal PETSc communicator
> 47534399112000 67760592
> > > [0] PetscCommDuplicate(): Using internal PETSc communicator
> 47534399112000 67760592
> > > [0] VecScatterCreateCommon_PtoS(): Using blocksize 1 scatter
> > > [0] VecScatterCreate(): General case: MPI to Seq
> > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 8920860 X 38109; storage
> space: 0 unneeded,1762711 used
> > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
> is 0
> > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 349
> > > Matrix Object: 160 MPI processes
> > >   type: mpiaij
> > >   rows=131858910, cols=18752388
> > >
> > > The code ran just fine on a smaller (pruned) input dataset.
> > > I don't get a stacktrace unfortunately... (running in production mode,
> trying to switch to debug mode now).
> > >
> > >
> > > Regards,
> > > Mihai
> > >
> >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121214/a6e650ee/attachment-0001.html>

From balay at mcs.anl.gov  Fri Dec 14 10:26:03 2012
From: balay at mcs.anl.gov (Satish Balay)
Date: Fri, 14 Dec 2012 10:26:03 -0600 (CST)
Subject: [petsc-users] Is MatSetUp required with MatCreateNormal and
 MatCreateMPIAIJWithSplitArrays?
In-Reply-To: <3647C5B0-A0C4-4BB9-8017-371BA2AB8B68@mcs.anl.gov>
References: <CAObMzT0XdWBLEvbPx9z_8hxfVJ8+Wn_wuO0sGKAQgK5eNS20WQ@mail.gmail.com>
	<269A4421-4AB1-4829-A578-5B66CEF5C6C9@mcs.anl.gov>
	<CAObMzT1tmpA+zNv-b-_93LsHnPSwU_xgUfHoHAU2KfKgu5VQtQ@mail.gmail.com>
	<3647C5B0-A0C4-4BB9-8017-371BA2AB8B68@mcs.anl.gov>
Message-ID: <alpine.LFD.2.02.1212141014010.2484@asterix>

pushed https://bitbucket.org/petsc/petsc-3.3/commits/6dac937a3eace3b81d6dcbb945ee7a85

Satish

On Fri, 14 Dec 2012, Barry Smith wrote:

> 
>    Mihai,
> 
>     Thanks for tracking down the problem. As a side note, you are getting close to using all of the space in int in your matrix row/column sizes, when you matrix sizes are great than 2^{31}-1 you will need to configure PETSc with --with-64-bit-indices to have PETSc use long long int for PetscInt.
> 
>    Satish,
> 
>      Could you please patch 3.3 and replace the use of %D with %lld and replace the (PetscInt) casts with (long long int) casts in the two lines
>       ierr = PetscViewerASCIIPrintf(viewer,"total: nonzeros=%D, allocated nonzeros=%D\n",(PetscInt)info.nz_used,(PetscInt)info.nz_allocated);CHKERRQ(ierr);
>         ierr = PetscViewerASCIIPrintf(viewer,"total number of mallocs used during MatSetValues calls =%D\n",(PetscInt)info.mallocs);CHKERRQ(ierr);
> 
>   Thanks
> 
>    Barry
> 
> On Dec 14, 2012, at 4:59 AM, Mihai Alexe <malexe at vt.edu> wrote:
> 
> > Barry,
> > 
> > I've tracked down the problem.
> > 
> > I ran with -info -mat_view_info, and fpe's enabled and got a SIGFPE after entering MatCreateMPIAIJWithSplitArrays (Petsc did not produce a stacktrace unfortunately). This was due to a floating point exception in a typecast inside mat/interface/matrix.c:
> > 
> >       if (mat->ops->getinfo) {
> >         MatInfo info;
> >         ierr = MatGetInfo(mat,MAT_GLOBAL_SUM,&info);CHKERRQ(ierr);
> >         ierr = PetscViewerASCIIPrintf(viewer,"total: nonzeros=%D, allocated nonzeros=%D\n",(PetscInt)info.nz_used,(PetscInt)info.nz_allocated);CHKERRQ(ierr);
> >         ierr = PetscViewerASCIIPrintf(viewer,"total number of mallocs used during MatSetValues calls =%D\n",(PetscInt)info.mallocs);CHKERRQ(ierr);
> >       }
> > 
> > My sparse matrix has about 6 billion nonzeros. When I disable FPEs, i get a silent overflow when converting MatInfo.nz_used from PetscLogDouble to (32-bit) PetscInt:
> > 
> > Matrix Object: 96 MPI processes
> >   type: mpiaij
> >   rows=131857963, cols=18752388
> >   total: nonzeros=-2147483648, allocated nonzeros=0 
> > 
> > and the code runs just fine. Maybe PETSc should cast nz_used to a long int?
> > 
> > 
> > Mihai
> > On Thu, Nov 29, 2012 at 6:25 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> > 
> > On Nov 29, 2012, at 9:48 AM, Mihai Alexe <malexe at vt.edu> wrote:
> > 
> > > Hello all,
> > >
> > > I am creating a large rectangular MPIAIJ matrix, then a shell NormalMatrix that eventually gets passed to a KSP object (all part of a constrained least-squares solver).
> > > Code looks as follows:
> > >
> > >  //user.A_mat and user.Hess are PETSc Mat
> > >
> > >  info = MatCreateMPIAIJWithSplitArrays( PETSC_COMM_WORLD, *locrow, *loccol, nrow,
> > >                                 *ncol, onrowidx, oncolidx,
> > >                                 (PetscScalar*) onvals, offrowidx, offcolidx,
> > >                                 (PetscScalar*) values, &user.A_mat ); CHKERRQ(info);
> > >
> > >  info = MatCreateNormal( user.A_mat, &user.Hess ); CHKERRQ(info);
> > >  info = MatSetUp( user.Hess );
> > >
> > > Is MatSetUp() required for A or Hess to be initialized correctly? Or some call to MatSetPreallocation?
> > '
> >    No you shouldn't need them.   Try with valgrind   http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
> > 
> >   Barry
> > 
> > >
> > > My code crashes after displaying (with -info -mat_view_info):
> > >
> > > [0] PetscCommDuplicate(): Duplicating a communicator 47534399113024 67425648 max tags = 2147483647
> > > [0] PetscCommDuplicate(): Duplicating a communicator 47534399112000 67760592 max tags = 2147483647
> > > [0] MatCreate_SeqAIJ_Inode(): Not using Inode routines due to -mat_no_inode
> > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 8920860 X 1508490; storage space: 0 unneeded,34572269 used
> > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 615
> > > Matrix Object: 1 MPI processes
> > >   type: seqaij
> > >   rows=8920860, cols=1508490
> > >   total: nonzeros=34572269, allocated nonzeros=0
> > >   total number of mallocs used during MatSetValues calls =0
> > >     not using I-node routines
> > > [0] PetscCommDuplicate(): Using internal PETSc communicator 47534399112000 67760592
> > > [0] MatCreate_SeqAIJ_Inode(): Not using Inode routines due to -mat_no_inode
> > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 8920860 X 18752388; storage space: 0 unneeded,1762711 used
> > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 349
> > > Matrix Object: 1 MPI processes
> > >   type: seqaij
> > >   rows=8920860, cols=18752388
> > >   total: nonzeros=1762711, allocated nonzeros=0
> > >   total number of mallocs used during MatSetValues calls =0
> > >     not using I-node routines
> > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 8920860 X 1508490; storage space: 0 unneeded,34572269 used
> > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 615
> > > Matrix Object: 1 MPI processes
> > >   type: seqaij
> > >   rows=8920860, cols=1508490
> > >   total: nonzeros=34572269, allocated nonzeros=0
> > >   total number of mallocs used during MatSetValues calls =0
> > >     not using I-node routines
> > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 8920860 X 18752388; storage space: 0 unneeded,1762711 used
> > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 349
> > > Matrix Object: 1 MPI processes
> > >   type: seqaij
> > >   rows=8920860, cols=18752388
> > >   total: nonzeros=1762711, allocated nonzeros=0
> > >   total number of mallocs used during MatSetValues calls =0
> > >     not using I-node routines
> > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 8920860 X 1508490; storage space: 0 unneeded,34572269 used
> > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 615
> > > [0] PetscCommDuplicate(): Using internal PETSc communicator 47534399112000 67760592
> > > [0] PetscCommDuplicate(): Using internal PETSc communicator 47534399112000 67760592
> > > [0] VecScatterCreateCommon_PtoS(): Using blocksize 1 scatter
> > > [0] VecScatterCreate(): General case: MPI to Seq
> > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 8920860 X 38109; storage space: 0 unneeded,1762711 used
> > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 349
> > > Matrix Object: 160 MPI processes
> > >   type: mpiaij
> > >   rows=131858910, cols=18752388
> > >
> > > The code ran just fine on a smaller (pruned) input dataset.
> > > I don't get a stacktrace unfortunately... (running in production mode, trying to switch to debug mode now).
> > >
> > >
> > > Regards,
> > > Mihai
> > >
> > 
> > 
> 
> 


From gokhalen at gmail.com  Mon Dec 17 15:54:44 2012
From: gokhalen at gmail.com (Nachiket Gokhale)
Date: Mon, 17 Dec 2012 16:54:44 -0500
Subject: [petsc-users] MatMatMult Question
Message-ID: <CAGBgCJHOYDdJ5mhhOwHKi5_LJrFzjOYe76X8j--soE6NFh-SkQ@mail.gmail.com>

I am trying to multiply the transpose of a matrix with another matrix using
matmatmult.  The transpose operator is created using MatTranspose. I get
the error at the end of the email. Is this an error saying that cols of
matrix 1 and not equal to the rows of matrix 2? I checked and the rows and
columns seem to allow matrix multiplication: Left = (54,1760),
Right=(1760,54).  The snipped of the code which produces this is

  ierr =
MatMatMult(*KFullMat,ProjR,MAT_INITIAL_MATRIX,PETSC_DEFAULT,&TempMat);CHKERRQ(ierr);
  ierr = MatGetSize(ProjLT,&nrowl,&ncoll); CHKERRQ(ierr);
  ierr = MatGetSize(TempMat,&nrowr,&ncolr); CHKERRQ(ierr);
  ierr = PetscPrintf(PETSC_COMM_WORLD,"Left = (%d,%d),
Right=(%d,%d)\n",nrowl,ncoll,nrowr,ncolr); CHKERRQ(ierr);
  ierr =
MatMatMult(ProjLT,TempMat,MAT_INITIAL_MATRIX,PETSC_DEFAULT,&KProj);
CHKERRQ(ierr);
  ierr = MatDestroy(&TempMat);

Thanks,

-Nachiket

[0]PETSC ERROR: --------------------- Error Message
------------------------------------
[0]PETSC ERROR: Arguments are incompatible!
[0]PETSC ERROR: MatMatMult requires A, seqdense, to be compatible with B,
seqdense!
[0]PETSC ERROR:
------------------------------------------------------------------------
[0]PETSC ERROR: Petsc Release Version 3.3.0, Patch 2, Fri Jul 13 15:42:00
CDT 2012
[0]PETSC ERROR: See docs/changes/index.html for recent updates.
[0]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
[0]PETSC ERROR: See docs/index.html for manual pages.
[0]PETSC ERROR:
------------------------------------------------------------------------
[0]PETSC ERROR: /home/gokhale/WAIGEN/GDEB-WAIGEN2012/bin/waigen on a
linux-gcc named asd1.wai.com by gokhale Mon Dec 17 16:58:13 2012
[0]PETSC ERROR: Libraries linked from
/opt/petsc/petsc-3.3-p2/linux-gcc-g++-mpich-mumps-complex-debug/lib
[0]PETSC ERROR: Configure run at Mon Oct 29 18:41:24 2012
[0]PETSC ERROR: Configure options --with-x=0 --with-mpi=1
--download-mpich=yes --with-x11=0 --with-debugging=1 --with-clanguage=C++
--with-shared-libraries=1 --download-mumps=yes --download-f-blas-lapack=1
--download-parmetis=1 --download-metis --download-scalapack=1
--download-blacs=1 --with-cmake=/usr/bin/cmake28 --with-scalar-type=complex
[0]PETSC ERROR:
------------------------------------------------------------------------
[0]PETSC ERROR: MatMatMult() line 8601 in
/opt/petsc/petsc-3.3-p2/src/mat/interface/matrix.c
[0]PETSC ERROR: waigensolvprojforc() line 33 in
src/examples/waigensolvprojforc.c
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121217/0f7ce6d6/attachment.html>

From hzhang at mcs.anl.gov  Mon Dec 17 22:05:19 2012
From: hzhang at mcs.anl.gov (Hong Zhang)
Date: Mon, 17 Dec 2012 22:05:19 -0600
Subject: [petsc-users] MatMatMult Question
In-Reply-To: <CAGBgCJHOYDdJ5mhhOwHKi5_LJrFzjOYe76X8j--soE6NFh-SkQ@mail.gmail.com>
References: <CAGBgCJHOYDdJ5mhhOwHKi5_LJrFzjOYe76X8j--soE6NFh-SkQ@mail.gmail.com>
Message-ID: <CAGCphBsXxS=pLdZUEaD7W20YhWg3D88r7nN5aATmH8i-Lh5rFQ@mail.gmail.com>

Nachiket :

Which version of petsc is used? Did you run the code in sequential?

> ------------------------------------
> [0]PETSC ERROR: Arguments are incompatible!
> [0]PETSC ERROR: MatMatMult requires A, seqdense, to be compatible with B,
> seqdense!
[0]PETSC ERROR: MatMatMult() line 8601 in
/opt/petsc/petsc-3.3-p2/src/mat/interface/matrix.c

The line (8601) does not match the latest petsc-3.3 and petsc-dev.
I cannot reproduce this error from the latest petsc-3.3 and petsc-dev.

The error complains about a failure to find MatMatMult_SeqDense_SeqDense().
Can you update to the latest petsc-3.3 or petsc-dev and see if your
code still crashes?

Hong

> I am trying to multiply the transpose of a matrix with another matrix using
> matmatmult.  The transpose operator is created using MatTranspose. I get the
> error at the end of the email. Is this an error saying that cols of matrix 1
> and not equal to the rows of matrix 2? I checked and the rows and columns
> seem to allow matrix multiplication: Left = (54,1760), Right=(1760,54).  The
> snipped of the code which produces this is
>
>   ierr =
> MatMatMult(*KFullMat,ProjR,MAT_INITIAL_MATRIX,PETSC_DEFAULT,&TempMat);CHKERRQ(ierr);
>   ierr = MatGetSize(ProjLT,&nrowl,&ncoll); CHKERRQ(ierr);
>   ierr = MatGetSize(TempMat,&nrowr,&ncolr); CHKERRQ(ierr);
>   ierr = PetscPrintf(PETSC_COMM_WORLD,"Left = (%d,%d),
> Right=(%d,%d)\n",nrowl,ncoll,nrowr,ncolr); CHKERRQ(ierr);
>   ierr = MatMatMult(ProjLT,TempMat,MAT_INITIAL_MATRIX,PETSC_DEFAULT,&KProj);
> CHKERRQ(ierr);
>   ierr = MatDestroy(&TempMat);
>
> Thanks,
>
> -Nachiket
>
> [0]PETSC ERROR: --------------------- Error Message
> ------------------------------------
> [0]PETSC ERROR: Arguments are incompatible!
> [0]PETSC ERROR: MatMatMult requires A, seqdense, to be compatible with B,
> seqdense!
> [0]PETSC ERROR:
> ------------------------------------------------------------------------
> [0]PETSC ERROR: Petsc Release Version 3.3.0, Patch 2, Fri Jul 13 15:42:00
> CDT 2012
> [0]PETSC ERROR: See docs/changes/index.html for recent updates.
> [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
> [0]PETSC ERROR: See docs/index.html for manual pages.
> [0]PETSC ERROR:
> ------------------------------------------------------------------------
> [0]PETSC ERROR: /home/gokhale/WAIGEN/GDEB-WAIGEN2012/bin/waigen on a
> linux-gcc named asd1.wai.com by gokhale Mon Dec 17 16:58:13 2012
> [0]PETSC ERROR: Libraries linked from
> /opt/petsc/petsc-3.3-p2/linux-gcc-g++-mpich-mumps-complex-debug/lib
> [0]PETSC ERROR: Configure run at Mon Oct 29 18:41:24 2012
> [0]PETSC ERROR: Configure options --with-x=0 --with-mpi=1
> --download-mpich=yes --with-x11=0 --with-debugging=1 --with-clanguage=C++
> --with-shared-libraries=1 --download-mumps=yes --download-f-blas-lapack=1
> --download-parmetis=1 --download-metis --download-scalapack=1
> --download-blacs=1 --with-cmake=/usr/bin/cmake28 --with-scalar-type=complex
> [0]PETSC ERROR:
> ------------------------------------------------------------------------
> [0]PETSC ERROR: MatMatMult() line 8601 in
> /opt/petsc/petsc-3.3-p2/src/mat/interface/matrix.c
> [0]PETSC ERROR: waigensolvprojforc() line 33 in
> src/examples/waigensolvprojforc.c
>

From gokhalen at gmail.com  Tue Dec 18 09:36:30 2012
From: gokhalen at gmail.com (Nachiket Gokhale)
Date: Tue, 18 Dec 2012 10:36:30 -0500
Subject: [petsc-users] MatMatMult Question
In-Reply-To: <CAGCphBsXxS=pLdZUEaD7W20YhWg3D88r7nN5aATmH8i-Lh5rFQ@mail.gmail.com>
References: <CAGBgCJHOYDdJ5mhhOwHKi5_LJrFzjOYe76X8j--soE6NFh-SkQ@mail.gmail.com>
	<CAGCphBsXxS=pLdZUEaD7W20YhWg3D88r7nN5aATmH8i-Lh5rFQ@mail.gmail.com>
Message-ID: <CAGBgCJEWdLhTL0xBkVLTr1EDWc==4+Ese0wcRyFSqHZBOPga0A@mail.gmail.com>

 Hong:

I used 3.3-p2 but I was able to reproduce this error with 3.3-p5 as well. I
ran it with one MPI process. I got the same error,

[0]PETSC ERROR: --------------------- Error Message
------------------------------------
[0]PETSC ERROR: Arguments are incompatible!
[0]PETSC ERROR: MatMatMult requires A, seqdense, to be compatible with B,
seqdense!
[0]PETSC ERROR:
------------------------------------------------------------------------
[0]PETSC ERROR: Petsc Release Version 3.3.0, Patch 5, Sat Dec  1 15:10:41
CST 2012
[0]PETSC ERROR: See docs/changes/index.html for recent updates.
[0]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
[0]PETSC ERROR: See docs/index.html for manual pages.
[0]PETSC ERROR:
------------------------------------------------------------------------
[0]PETSC ERROR: /home/gokhale/WAIGEN/GDEB-WAIGEN2012/bin/waigen on a
linux-gcc named asd1.wai.com by gokhale Tue Dec 18 10:44:18 2012
[0]PETSC ERROR: Libraries linked from
/opt/petsc/petsc-3.3-p5/linux-gcc-g++-mpich-mumps-complex-debug/lib
[0]PETSC ERROR: Configure run at Tue Dec 18 10:09:32 2012
[0]PETSC ERROR: Configure options --with-x=0 --with-mpi=1
--download-mpich=yes --with-x11=0 --with-debugging=1 --with-clanguage=C++
--with-shared-libraries=1 --download-mumps=yes --download-f-blas-lapack=1
--download-parmetis=1 --download-metis --download-scalapack=1
--download-blacs=1 --with-cmake=/usr/bin/cmake28 --with-scalar-type=complex
[0]PETSC ERROR:
------------------------------------------------------------------------
[0]PETSC ERROR: MatMatMult() line 8617 in
/opt/petsc/petsc-3.3-p5/src/mat/interface/matrix.c
[0]PETSC ERROR: waigensolvprojforc() line 31 in
src/examples/waigensolvprojforc.c


Using one MPI process, this error goes away when I make a temporary matrix
and store the result of the first multiplication in it as in:

  ierr = MatLoad(ProjR,viewer);CHKERRQ(ierr);
  ierr = PetscViewerDestroy(&viewer);CHKERRQ(ierr);
  ierr =
MatMatMult(*KFullMat,ProjR,MAT_INITIAL_MATRIX,PETSC_DEFAULT,&TempMat);CHKERRQ(ierr);
  ierr = MatDuplicate(TempMat,MAT_COPY_VALUES,&TempMat2); CHKERRQ(ierr);
  ierr =
MatMatMult(ProjLT,TempMat2,MAT_INITIAL_MATRIX,PETSC_DEFAULT,&KProj);
CHKERRQ(ierr);
  ierr = MatDestroy(&TempMat);CHKERRQ(ierr);
  ierr = MatDestroy(&TempMat2);CHKERRQ(ierr);

If I run more than one process the error returns.

[0]PETSC ERROR: --------------------- Error Message
------------------------------------
[0]PETSC ERROR: Arguments are incompatible!
[0]PETSC ERROR: MatMatMult requires A, mpidense, to be compatible with B,
mpidense!
[0]PETSC ERROR:
------------------------------------------------------------------------
[0]PETSC ERROR: Petsc Release Version 3.3.0, Patch 5, Sat Dec  1 15:10:41
CST 2012
[0]PETSC ERROR: See docs/changes/index.html for recent updates.
[0]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
[0]PETSC ERROR: See docs/index.html for manual pages.
[0]PETSC ERROR:
------------------------------------------------------------------------
[0]PETSC ERROR: /home/gokhale/WAIGEN/GDEB-WAIGEN2012/bin/waigen on a
linux-gcc named asd1.wai.com by gokhale Tue Dec 18 10:42:15 2012
[0]PETSC ERROR: Libraries linked from
/opt/petsc/petsc-3.3-p5/linux-gcc-g++-mpich-mumps-complex-debug/lib
[0]PETSC ERROR: Configure run at Tue Dec 18 10:09:32 2012
[0]PETSC ERROR: Configure options --with-x=0 --with-mpi=1
--download-mpich=yes --with-x11=0 --with-debugging=1 --with-clanguage=C++
--with-shared-libraries=1 --download-mumps=yes --download-f-blas-lapack=1
--download-parmetis=1 --download-metis --download-scalapack=1
--download-blacs=1 --with-cmake=/usr/bin/cmake28 --with-scalar-type=complex
[0]PETSC ERROR:
------------------------------------------------------------------------
[0]PETSC ERROR: MatMatMult() line 8617 in
/opt/petsc/petsc-3.3-p5/src/mat/interface/matrix.c
[0]PETSC ERROR: waigensolvprojforc() line 31 in
src/examples/waigensolvprojforc.c


If it matters I am using Petsc through SlepC-3.3-p3

-Nachiket


On Mon, Dec 17, 2012 at 11:05 PM, Hong Zhang <hzhang at mcs.anl.gov> wrote:

> Nachiket :
>
> Which version of petsc is used? Did you run the code in sequential?
>
> > ------------------------------------
> > [0]PETSC ERROR: Arguments are incompatible!
> > [0]PETSC ERROR: MatMatMult requires A, seqdense, to be compatible with B,
> > seqdense!
> [0]PETSC ERROR: MatMatMult() line 8601 in
> /opt/petsc/petsc-3.3-p2/src/mat/interface/matrix.c
>
> The line (8601) does not match the latest petsc-3.3 and petsc-dev.
> I cannot reproduce this error from the latest petsc-3.3 and petsc-dev.
>
> The error complains about a failure to find MatMatMult_SeqDense_SeqDense().
> Can you update to the latest petsc-3.3 or petsc-dev and see if your
> code still crashes?
>
> Hong
>
> > I am trying to multiply the transpose of a matrix with another matrix
> using
> > matmatmult.  The transpose operator is created using MatTranspose. I get
> the
> > error at the end of the email. Is this an error saying that cols of
> matrix 1
> > and not equal to the rows of matrix 2? I checked and the rows and columns
> > seem to allow matrix multiplication: Left = (54,1760), Right=(1760,54).
>  The
> > snipped of the code which produces this is
> >
> >   ierr =
> >
> MatMatMult(*KFullMat,ProjR,MAT_INITIAL_MATRIX,PETSC_DEFAULT,&TempMat);CHKERRQ(ierr);
> >   ierr = MatGetSize(ProjLT,&nrowl,&ncoll); CHKERRQ(ierr);
> >   ierr = MatGetSize(TempMat,&nrowr,&ncolr); CHKERRQ(ierr);
> >   ierr = PetscPrintf(PETSC_COMM_WORLD,"Left = (%d,%d),
> > Right=(%d,%d)\n",nrowl,ncoll,nrowr,ncolr); CHKERRQ(ierr);
> >   ierr =
> MatMatMult(ProjLT,TempMat,MAT_INITIAL_MATRIX,PETSC_DEFAULT,&KProj);
> > CHKERRQ(ierr);
> >   ierr = MatDestroy(&TempMat);
> >
> > Thanks,
> >
> > -Nachiket
> >
> > [0]PETSC ERROR: --------------------- Error Message
> > ------------------------------------
> > [0]PETSC ERROR: Arguments are incompatible!
> > [0]PETSC ERROR: MatMatMult requires A, seqdense, to be compatible with B,
> > seqdense!
> > [0]PETSC ERROR:
> > ------------------------------------------------------------------------
> > [0]PETSC ERROR: Petsc Release Version 3.3.0, Patch 2, Fri Jul 13 15:42:00
> > CDT 2012
> > [0]PETSC ERROR: See docs/changes/index.html for recent updates.
> > [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
> > [0]PETSC ERROR: See docs/index.html for manual pages.
> > [0]PETSC ERROR:
> > ------------------------------------------------------------------------
> > [0]PETSC ERROR: /home/gokhale/WAIGEN/GDEB-WAIGEN2012/bin/waigen on a
> > linux-gcc named asd1.wai.com by gokhale Mon Dec 17 16:58:13 2012
> > [0]PETSC ERROR: Libraries linked from
> > /opt/petsc/petsc-3.3-p2/linux-gcc-g++-mpich-mumps-complex-debug/lib
> > [0]PETSC ERROR: Configure run at Mon Oct 29 18:41:24 2012
> > [0]PETSC ERROR: Configure options --with-x=0 --with-mpi=1
> > --download-mpich=yes --with-x11=0 --with-debugging=1 --with-clanguage=C++
> > --with-shared-libraries=1 --download-mumps=yes --download-f-blas-lapack=1
> > --download-parmetis=1 --download-metis --download-scalapack=1
> > --download-blacs=1 --with-cmake=/usr/bin/cmake28
> --with-scalar-type=complex
> > [0]PETSC ERROR:
> > ------------------------------------------------------------------------
> > [0]PETSC ERROR: MatMatMult() line 8601 in
> > /opt/petsc/petsc-3.3-p2/src/mat/interface/matrix.c
> > [0]PETSC ERROR: waigensolvprojforc() line 33 in
> > src/examples/waigensolvprojforc.c
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121218/055dd880/attachment.html>

From gaurish108 at gmail.com  Tue Dec 18 20:17:54 2012
From: gaurish108 at gmail.com (Gaurish Telang)
Date: Tue, 18 Dec 2012 21:17:54 -0500
Subject: [petsc-users] Simple query about GPU usage in PETSc
Message-ID: <CAC4+K-TVv_ajKjVBYNga4ECr7nP-GTwZtjS51UepzZiR2PtnPA@mail.gmail.com>

I am trying out PETSc's GPU features for the first time.

After skimming, a paper on the PETSc-GPU interface.
http://www.stanford.edu/~vminden/docs/gpus.pdf

I just wanted to confirm whether the following observation is correct.

Suppose I want to solve  Ax=b and set  the PETSc vector- and matrix-type
from the command-line

Then to make my code run on the GPU,  *all* I need to do is to
    (1)  set the "-vec_type"  at the command-line as "seqcusp" or "mpicusp"
(depending on whether I am using a single/multiple GPU process )
    (2)  set the "-mat_type"  at the command-line as "seqaijcusp" or "
mpiaijcusp" (depending on whether I am using a single/multiple CPU process )
    (3)  Solving the system Ax=b is done the "usual" way (see below) i.e
nothing CUDA specific.

  ierr = KSPCreate(PETSC_COMM_WORLD,&ksp);CHKERRQ(ierr);
  ierr = KSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN);CHKERRQ(ierr);
  ierr = KSPGetPC(ksp,&pc);CHKERRQ(ierr);
  ierr = PCSetType(pc,PCJACOBI);CHKERRQ(ierr);
  ierr =
KSPSetTolerances(ksp,1.e-5,PETSC_DEFAULT,PETSC_DEFAULT,PETSC_DEFAULT);CHKERRQ(ierr);

  if (nonzeroguess) {
    PetscScalar p = .5;
    ierr = VecSet(x,p);CHKERRQ(ierr);
    ierr = KSPSetInitialGuessNonzero(ksp,PETSC_TRUE);CHKERRQ(ierr);
  }
ierr = KSPSolve(ksp,b,x);CHKERRQ(ierr);

(4) Looking at the type of the vector and the matrix, PETSc hands over the
control to the corresponding CUSP solver.



Thank you,

Gaurish
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121218/78fedb8a/attachment.html>

From knepley at gmail.com  Tue Dec 18 20:58:54 2012
From: knepley at gmail.com (Matthew Knepley)
Date: Tue, 18 Dec 2012 21:58:54 -0500
Subject: [petsc-users] Simple query about GPU usage in PETSc
In-Reply-To: <CAC4+K-TVv_ajKjVBYNga4ECr7nP-GTwZtjS51UepzZiR2PtnPA@mail.gmail.com>
References: <CAC4+K-TVv_ajKjVBYNga4ECr7nP-GTwZtjS51UepzZiR2PtnPA@mail.gmail.com>
Message-ID: <CAMYG4GmfYSgBtHT2kvndbZKvBihycf3c3KmTXBFpyPofeOz8OQ@mail.gmail.com>

On Tue, Dec 18, 2012 at 9:17 PM, Gaurish Telang <gaurish108 at gmail.com> wrote:
> I am trying out PETSc's GPU features for the first time.
>
> After skimming, a paper on the PETSc-GPU interface.
> http://www.stanford.edu/~vminden/docs/gpus.pdf
>
> I just wanted to confirm whether the following observation is correct.
>
> Suppose I want to solve  Ax=b and set  the PETSc vector- and matrix-type
> from the command-line
>
> Then to make my code run on the GPU,  *all* I need to do is to
>     (1)  set the "-vec_type"  at the command-line as "seqcusp" or "mpicusp"
> (depending on whether I am using a single/multiple GPU process )
>     (2)  set the "-mat_type"  at the command-line as "seqaijcusp" or
> "mpiaijcusp" (depending on whether I am using a single/multiple CPU process
> )
>     (3)  Solving the system Ax=b is done the "usual" way (see below) i.e
> nothing CUDA specific.
>
>   ierr = KSPCreate(PETSC_COMM_WORLD,&ksp);CHKERRQ(ierr);
>   ierr = KSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN);CHKERRQ(ierr);
>   ierr = KSPGetPC(ksp,&pc);CHKERRQ(ierr);
>   ierr = PCSetType(pc,PCJACOBI);CHKERRQ(ierr);
>   ierr =
> KSPSetTolerances(ksp,1.e-5,PETSC_DEFAULT,PETSC_DEFAULT,PETSC_DEFAULT);CHKERRQ(ierr);
>
>   if (nonzeroguess) {
>     PetscScalar p = .5;
>     ierr = VecSet(x,p);CHKERRQ(ierr);
>     ierr = KSPSetInitialGuessNonzero(ksp,PETSC_TRUE);CHKERRQ(ierr);
>   }
> ierr = KSPSolve(ksp,b,x);CHKERRQ(ierr);
>
> (4) Looking at the type of the vector and the matrix, PETSc hands over the
> control to the corresponding CUSP solver.

Yes, that should work.

   Matt

> Thank you,
>
> Gaurish



--
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener

From daniel.arndt at stud.uni-goettingen.de  Wed Dec 19 09:00:22 2012
From: daniel.arndt at stud.uni-goettingen.de (Daniel Arndt)
Date: Wed, 19 Dec 2012 16:00:22 +0100
Subject: [petsc-users] early convergence failure
In-Reply-To: <7B5C8583-1A21-49C3-B8B3-FF708B84D4F8@mcs.anl.gov>
References: <7B5C8583-1A21-49C3-B8B3-FF708B84D4F8@mcs.anl.gov>
Message-ID: <50D1D686.8030307@stud.uni-goettingen.de>

>>/ Thank you Barry for your suggestions.
/>/> //The error I get is now KSP_DIVERGED_INDEFINITE_PC. The matrix that I try to invert is actually symmetric and positive definite. I was not aware that this can lead to a indefinite preconditioner.
/
>   Absolutely. Many preconditioners do not retain this feature even in exact precision and with numerical effects it can even appear unexpected.
>   By default BoomAMG doesn't retain this.

Is there then a possibility to tell the BlockJacobi preconditioner to be positive definit as there is for the BoomerAMG preconditioner via -pc_hypre_boomeramg_relax_type_all symmetric-SOR/Jacobi?

Bests
Daniel

 //

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121219/40490170/attachment.html>

From knepley at gmail.com  Wed Dec 19 09:06:05 2012
From: knepley at gmail.com (Matthew Knepley)
Date: Wed, 19 Dec 2012 10:06:05 -0500
Subject: [petsc-users] early convergence failure
In-Reply-To: <50D1D686.8030307@stud.uni-goettingen.de>
References: <7B5C8583-1A21-49C3-B8B3-FF708B84D4F8@mcs.anl.gov>
	<50D1D686.8030307@stud.uni-goettingen.de>
Message-ID: <CAMYG4GmGmAPy_5G3OCXRE4w3M4USDvf-sR3Sh1-T6N6sobuFyg@mail.gmail.com>

On Wed, Dec 19, 2012 at 10:00 AM, Daniel Arndt
<daniel.arndt at stud.uni-goettingen.de> wrote:
>>> Thank you Barry for your suggestions.
>>> The error I get is now KSP_DIVERGED_INDEFINITE_PC. The matrix that I try
>>> to invert is actually symmetric and positive definite. I was not aware that
>>> this can lead to a indefinite preconditioner.
>
>>   Absolutely. Many preconditioners do not retain this feature even in
>> exact precision and with numerical effects it can even appear unexpected.
>>   By default BoomAMG doesn't retain this.
>
> Is there then a possibility to tell the BlockJacobi preconditioner to be
> positive definit as there is for the BoomerAMG preconditioner via
> -pc_hypre_boomeramg_relax_type_all symmetric-SOR/Jacobi?

Block-Jacobi is just a container. You can choose the inner solver to
respect this.

   Matt

> Bests
> Daniel
>
>



--
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener

From stefan.kurzbach at tuhh.de  Wed Dec 19 10:25:09 2012
From: stefan.kurzbach at tuhh.de (Stefan Kurzbach)
Date: Wed, 19 Dec 2012 17:25:09 +0100
Subject: [petsc-users] Direct Schur complement domain decomposition
Message-ID: <002b01cdde05$6a2d0330$3e870990$@tuhh.de>

Hello everybody,

 

in my recent research on parallelization of a 2D unstructured flow model
code I came upon a question on domain decomposition techniques in "grids".
Maybe someone knows of any previous results on this?

 

Typically, when doing large simulations with many unknowns, the problem is
distributed to many computer nodes and solved in parallel by some iterative
method. Many of these iterative methods boil down to a large number of
distributed matrix-vector multiplications (in the order of the number of
iterations). This means there are many synchronization points in the
algorithms, which makes them tightly coupled. This has been found to work
well on clusters with fast networks.

 

Now my question:

What if there is a small number of very powerful nodes (say less than 10),
which are connected by a slow network, e.g. several computer clusters
connected over the internet (some people call this "grid computing"). I
expect that the traditional iterative methods will not be as efficient here
(any references?).

 

My guess is that a solution method with fewer synchronization points will
work better, even though that method may be computationally more expensive
than traditional methods. An example would be a domain composition approach
with direct solution of the Schur complement on the interface. This requires
that the interface size has to be small compared to the subdomain size. As
this algorithm basically works in three decoupled phases (solve the
subdomains for several right hand sides, assemble and solve the Schur
complement system, correct the subdomain results) it should be suited well,
but I have no idea how to test or otherwise prove it. Has anybody made any
thoughts on this before, possibly dating back to the 80ies and 90ies, where
slow networks were more common?

 

Best regards

Stefan

 

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121219/4efd7743/attachment.html>

From dog at lanl.gov  Wed Dec 19 15:54:17 2012
From: dog at lanl.gov (Gunter, David O)
Date: Wed, 19 Dec 2012 21:54:17 +0000
Subject: [petsc-users] Compiling 3.3 for Open-MPI on a SLURM system
Message-ID: <C46B47ADFA319D40ACFA4CDBB76AC2BF099D0304@ECS-EXG-P-MB01.win.lanl.gov>

I am trying to compile PETSc on a SLURM-based system using GCC, openmpi-1.6.3.

Here's my configure line:

$ ./configure --prefix=/tmp/dog/petsc-3.3-p5 --with-mpiexec=mpiexec

configure bombs out on this test:

TESTING: configureMPITypes from config.packages.MPI(/usr/aprojects/hpctools/dog/petsc/petsc-3.3-p5/config/BuildSystem/config/packages/MPI.py:230)srun: error: slurm_send_recv_rc_msg_only_one: Connection timed out
srun: error: slurm_send_recv_rc_msg_only_one: Connection timed out

It should not be trying to srun anything as we use mpiexec with Open-MPI.  Any ideas?

-david

--
David Gunter
HPC-3: Infrastructure Team
Los Alamos National Laboratory





From knepley at gmail.com  Wed Dec 19 16:03:02 2012
From: knepley at gmail.com (Matthew Knepley)
Date: Wed, 19 Dec 2012 17:03:02 -0500
Subject: [petsc-users] Compiling 3.3 for Open-MPI on a SLURM system
In-Reply-To: <C46B47ADFA319D40ACFA4CDBB76AC2BF099D0304@ECS-EXG-P-MB01.win.lanl.gov>
References: <C46B47ADFA319D40ACFA4CDBB76AC2BF099D0304@ECS-EXG-P-MB01.win.lanl.gov>
Message-ID: <CAMYG4GnyY6RZRm4hPrSTw3ugcC6-qBdnoSmcai0B3-4jkWKPRw@mail.gmail.com>

On Wed, Dec 19, 2012 at 4:54 PM, Gunter, David O <dog at lanl.gov> wrote:
> I am trying to compile PETSc on a SLURM-based system using GCC, openmpi-1.6.3.
>
> Here's my configure line:
>
> $ ./configure --prefix=/tmp/dog/petsc-3.3-p5 --with-mpiexec=mpiexec
>
> configure bombs out on this test:
>
> TESTING: configureMPITypes from config.packages.MPI(/usr/aprojects/hpctools/dog/petsc/petsc-3.3-p5/config/BuildSystem/config/packages/MPI.py:230)srun: error: slurm_send_recv_rc_msg_only_one: Connection timed out
> srun: error: slurm_send_recv_rc_msg_only_one: Connection timed out
>
> It should not be trying to srun anything as we use mpiexec with Open-MPI.  Any ideas?

If you can't run anything, you need --with-batch for the configure.

  Thanks,

     Matt

> -david
>
> --
> David Gunter
> HPC-3: Infrastructure Team
> Los Alamos National Laboratory
>
>
>
>



--
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener

From balay at mcs.anl.gov  Wed Dec 19 16:03:56 2012
From: balay at mcs.anl.gov (Satish Balay)
Date: Wed, 19 Dec 2012 16:03:56 -0600 (CST)
Subject: [petsc-users] Compiling 3.3 for Open-MPI on a SLURM system
In-Reply-To: <C46B47ADFA319D40ACFA4CDBB76AC2BF099D0304@ECS-EXG-P-MB01.win.lanl.gov>
References: <C46B47ADFA319D40ACFA4CDBB76AC2BF099D0304@ECS-EXG-P-MB01.win.lanl.gov>
Message-ID: <alpine.LFD.2.02.1212191602470.20262@asterix>

Perhaps mpiexec is invoking srun internally? The details should be in configure.log

[petsc configure doesn't know about srun].

Satish

On Wed, 19 Dec 2012, Gunter, David O wrote:

> I am trying to compile PETSc on a SLURM-based system using GCC, openmpi-1.6.3.
> 
> Here's my configure line:
> 
> $ ./configure --prefix=/tmp/dog/petsc-3.3-p5 --with-mpiexec=mpiexec
> 
> configure bombs out on this test:
> 
> TESTING: configureMPITypes from config.packages.MPI(/usr/aprojects/hpctools/dog/petsc/petsc-3.3-p5/config/BuildSystem/config/packages/MPI.py:230)srun: error: slurm_send_recv_rc_msg_only_one: Connection timed out
> srun: error: slurm_send_recv_rc_msg_only_one: Connection timed out
> 
> It should not be trying to srun anything as we use mpiexec with Open-MPI.  Any ideas?
> 
> -david
> 
> --
> David Gunter
> HPC-3: Infrastructure Team
> Los Alamos National Laboratory
> 
> 
> 
> 
> 


From thomas.witkowski at tu-dresden.de  Thu Dec 20 14:16:52 2012
From: thomas.witkowski at tu-dresden.de (Thomas Witkowski)
Date: Thu, 20 Dec 2012 21:16:52 +0100
Subject: [petsc-users] LU factorization and solution of independent matrices
 does not scale, why?
Message-ID: <50D37234.2040205@tu-dresden.de>

In my multilevel FETI-DP code, I have localized course matrices, which 
are defined on only a subset of all MPI tasks, typically between 4 and 
64 tasks. The MatAIJ and the KSP objects are both defined on a MPI 
communicator, which is a subset of MPI::COMM_WORLD. The LU factorization 
of the matrices is computed with either MUMPS or superlu_dist, but both 
show some scaling property I really wonder of: When the overall problem 
size is increased, the solve with the LU factorization of the local 
matrices does not scale! But why not? I just increase the number of 
local matrices, but all of them are independent of each other. Some 
example: I use 64 cores, each coarse matrix is spanned by 4 cores so 
there are 16 MPI communicators with 16 coarse space matrices. The 
problem need to solve 192 times with the coarse space systems, and this 
takes together 0.09 seconds. Now I increase the number of cores to 256, 
but let the local coarse space be defined again on only 4 cores. Again, 
192 solutions with these coarse spaces are required, but now this takes 
0.24 seconds. The same for 1024 cores, and we are at 1.7 seconds for the 
local coarse space solver!

For me, this is a total mystery! Any idea how to explain, debug and 
eventually how to resolve this problem?

Thomas

From Thomas.Witkowski at tu-dresden.de  Thu Dec 20 14:19:59 2012
From: Thomas.Witkowski at tu-dresden.de (Thomas Witkowski)
Date: Thu, 20 Dec 2012 21:19:59 +0100
Subject: [petsc-users] LU factorization and solution of independent matrices
	does not	scale, why?
Message-ID: <20121220211959.9srb50dlcc4wgc4o@mail.zih.tu-dresden.de>

In my multilevel FETI-DP code, I have localized course matrices, which  
are defined on only a subset of all MPI tasks, typically between 4 and  
64 tasks. The MatAIJ and the KSP objects are both defined on a MPI  
communicator, which is a subset of MPI::COMM_WORLD. The LU  
factorization of the matrices is computed with either MUMPS or  
superlu_dist, but both show some scaling property I really wonder of:  
When the overall problem size is increased, the solve with the LU  
factorization of the local matrices does not scale! But why not? I  
just increase the number of local matrices, but all of them are  
independent of each other. Some example: I use 64 cores, each coarse  
matrix is spanned by 4 cores so there are 16 MPI communicators with 16  
coarse space matrices. The problem need to solve 192 times with the  
coarse space systems, and this takes together 0.09 seconds. Now I  
increase the number of cores to 256, but let the local coarse space be  
defined again on only 4 cores. Again, 192 solutions with these coarse  
spaces are required, but now this takes 0.24 seconds. The same for  
1024 cores, and we are at 1.7 seconds for the local coarse space solver!

For me, this is a total mystery! Any idea how to explain, debug and  
eventually how to resolve this problem?

Thomas

From bsmith at mcs.anl.gov  Thu Dec 20 14:23:45 2012
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Thu, 20 Dec 2012 14:23:45 -0600
Subject: [petsc-users] LU factorization and solution of independent
	matrices does not scale, why?
In-Reply-To: <50D37234.2040205@tu-dresden.de>
References: <50D37234.2040205@tu-dresden.de>
Message-ID: <4F2AF113-B369-42AD-95B9-3D4C1E8F5CEE@mcs.anl.gov>


  Are you timing ONLY the time to factor and solve the subproblems? Or also the time to get the data to the collection of 4 cores at a time?

   If you are only using LU for these problems and not elsewhere in the code you can get the factorization and time from MatLUFactor() and MatSolve() or you can use stages to put this calculation in its own stage and use the MatLUFactor() and MatSolve() time from that stage.
Also look at the load balancing column for the factorization and solve stage, it is well balanced?

   Barry

On Dec 20, 2012, at 2:16 PM, Thomas Witkowski <thomas.witkowski at tu-dresden.de> wrote:

> In my multilevel FETI-DP code, I have localized course matrices, which are defined on only a subset of all MPI tasks, typically between 4 and 64 tasks. The MatAIJ and the KSP objects are both defined on a MPI communicator, which is a subset of MPI::COMM_WORLD. The LU factorization of the matrices is computed with either MUMPS or superlu_dist, but both show some scaling property I really wonder of: When the overall problem size is increased, the solve with the LU factorization of the local matrices does not scale! But why not? I just increase the number of local matrices, but all of them are independent of each other. Some example: I use 64 cores, each coarse matrix is spanned by 4 cores so there are 16 MPI communicators with 16 coarse space matrices. The problem need to solve 192 times with the coarse space systems, and this takes together 0.09 seconds. Now I increase the number of cores to 256, but let the local coarse space be defined again on only 4 cores. Again, 192 solutions with these coarse spaces are required, but now this takes 0.24 seconds. The same for 1024 cores, and we are at 1.7 seconds for the local coarse space solver!
> 
> For me, this is a total mystery! Any idea how to explain, debug and eventually how to resolve this problem?
> 
> Thomas


From Thomas.Witkowski at tu-dresden.de  Thu Dec 20 14:39:50 2012
From: Thomas.Witkowski at tu-dresden.de (Thomas Witkowski)
Date: Thu, 20 Dec 2012 21:39:50 +0100
Subject: [petsc-users] LU factorization and solution of
	independent	matrices does not scale, why?
In-Reply-To: <4F2AF113-B369-42AD-95B9-3D4C1E8F5CEE@mcs.anl.gov>
References: <50D37234.2040205@tu-dresden.de>
	<4F2AF113-B369-42AD-95B9-3D4C1E8F5CEE@mcs.anl.gov>
Message-ID: <20121220213950.nyu4ddy1og0kkw8c@mail.zih.tu-dresden.de>

I cannot use the information from log_summary, as I have three  
different LU factorizations and solve (local matrices and two  
hierarchies of coarse grids). Therefore, I use the following work  
around to get the timing of the solve I'm intrested in:

     MPI::COMM_WORLD.Barrier();
     wtime = MPI::Wtime();
     KSPSolve(*(data->ksp_schur_primal_local), tmp_primal, tmp_primal);
     FetiTimings::fetiSolve03 += (MPI::Wtime() - wtime);

The factorization is done explicitly before with "KSPSetUp", so I can  
measure the time for LU factorization. It also does not scale! For 64  
cores, I takes 0.05 seconds, for 1024 cores 1.2 seconds. In all  
calculations, the local coarse space matrices defined on four cores  
have exactly the same number of rows and exactly the same number of  
non zero entries. So, from my point of view, the time should be  
absolutely constant.

Thomas

Zitat von Barry Smith <bsmith at mcs.anl.gov>:

>
>   Are you timing ONLY the time to factor and solve the subproblems?   
> Or also the time to get the data to the collection of 4 cores at a   
> time?
>
>    If you are only using LU for these problems and not elsewhere in   
> the code you can get the factorization and time from MatLUFactor()   
> and MatSolve() or you can use stages to put this calculation in its   
> own stage and use the MatLUFactor() and MatSolve() time from that   
> stage.
> Also look at the load balancing column for the factorization and   
> solve stage, it is well balanced?
>
>    Barry
>
> On Dec 20, 2012, at 2:16 PM, Thomas Witkowski   
> <thomas.witkowski at tu-dresden.de> wrote:
>
>> In my multilevel FETI-DP code, I have localized course matrices,   
>> which are defined on only a subset of all MPI tasks, typically   
>> between 4 and 64 tasks. The MatAIJ and the KSP objects are both   
>> defined on a MPI communicator, which is a subset of   
>> MPI::COMM_WORLD. The LU factorization of the matrices is computed   
>> with either MUMPS or superlu_dist, but both show some scaling   
>> property I really wonder of: When the overall problem size is   
>> increased, the solve with the LU factorization of the local   
>> matrices does not scale! But why not? I just increase the number of  
>>  local matrices, but all of them are independent of each other.  
>> Some  example: I use 64 cores, each coarse matrix is spanned by 4  
>> cores  so there are 16 MPI communicators with 16 coarse space  
>> matrices.  The problem need to solve 192 times with the coarse  
>> space systems,  and this takes together 0.09 seconds. Now I  
>> increase the number of  cores to 256, but let the local coarse  
>> space be defined again on  only 4 cores. Again, 192 solutions with  
>> these coarse spaces are  required, but now this takes 0.24 seconds.  
>> The same for 1024 cores,  and we are at 1.7 seconds for the local  
>> coarse space solver!
>>
>> For me, this is a total mystery! Any idea how to explain, debug and  
>>  eventually how to resolve this problem?
>>
>> Thomas
>
>



From jack.poulson at gmail.com  Thu Dec 20 14:53:34 2012
From: jack.poulson at gmail.com (Jack Poulson)
Date: Thu, 20 Dec 2012 14:53:34 -0600
Subject: [petsc-users] LU factorization and solution of independent
 matrices does not scale, why?
In-Reply-To: <20121220213950.nyu4ddy1og0kkw8c@mail.zih.tu-dresden.de>
References: <50D37234.2040205@tu-dresden.de>
	<4F2AF113-B369-42AD-95B9-3D4C1E8F5CEE@mcs.anl.gov>
	<20121220213950.nyu4ddy1og0kkw8c@mail.zih.tu-dresden.de>
Message-ID: <CA+S8wr9c6sVBTejncFvau_gRe+Z1vLQTJgQuvEWOmo2Wj29Kxg@mail.gmail.com>

Hi Thomas,

Network topology is important. Since most machines are not fully connected,
random subsets of four processes will become more scattered about the
cluster as you increase your total number of processes.

Jack
On Dec 20, 2012 12:39 PM, "Thomas Witkowski" <Thomas.Witkowski at tu-dresden.de>
wrote:

> I cannot use the information from log_summary, as I have three different
> LU factorizations and solve (local matrices and two hierarchies of coarse
> grids). Therefore, I use the following work around to get the timing of the
> solve I'm intrested in:
>
>     MPI::COMM_WORLD.Barrier();
>     wtime = MPI::Wtime();
>     KSPSolve(*(data->ksp_schur_**primal_local), tmp_primal, tmp_primal);
>     FetiTimings::fetiSolve03 += (MPI::Wtime() - wtime);
>
> The factorization is done explicitly before with "KSPSetUp", so I can
> measure the time for LU factorization. It also does not scale! For 64
> cores, I takes 0.05 seconds, for 1024 cores 1.2 seconds. In all
> calculations, the local coarse space matrices defined on four cores have
> exactly the same number of rows and exactly the same number of non zero
> entries. So, from my point of view, the time should be absolutely constant.
>
> Thomas
>
> Zitat von Barry Smith <bsmith at mcs.anl.gov>:
>
>
>>   Are you timing ONLY the time to factor and solve the subproblems?  Or
>> also the time to get the data to the collection of 4 cores at a  time?
>>
>>    If you are only using LU for these problems and not elsewhere in  the
>> code you can get the factorization and time from MatLUFactor()  and
>> MatSolve() or you can use stages to put this calculation in its  own stage
>> and use the MatLUFactor() and MatSolve() time from that  stage.
>> Also look at the load balancing column for the factorization and  solve
>> stage, it is well balanced?
>>
>>    Barry
>>
>> On Dec 20, 2012, at 2:16 PM, Thomas Witkowski  <
>> thomas.witkowski at tu-dresden.**de <thomas.witkowski at tu-dresden.de>> wrote:
>>
>>  In my multilevel FETI-DP code, I have localized course matrices,  which
>>> are defined on only a subset of all MPI tasks, typically  between 4 and 64
>>> tasks. The MatAIJ and the KSP objects are both  defined on a MPI
>>> communicator, which is a subset of  MPI::COMM_WORLD. The LU factorization
>>> of the matrices is computed  with either MUMPS or superlu_dist, but both
>>> show some scaling  property I really wonder of: When the overall problem
>>> size is  increased, the solve with the LU factorization of the local
>>>  matrices does not scale! But why not? I just increase the number of  local
>>> matrices, but all of them are independent of each other. Some  example: I
>>> use 64 cores, each coarse matrix is spanned by 4 cores  so there are 16 MPI
>>> communicators with 16 coarse space matrices.  The problem need to solve 192
>>> times with the coarse space systems,  and this takes together 0.09 seconds.
>>> Now I increase the number of  cores to 256, but let the local coarse space
>>> be defined again on  only 4 cores. Again, 192 solutions with these coarse
>>> spaces are  required, but now this takes 0.24 seconds. The same for 1024
>>> cores,  and we are at 1.7 seconds for the local coarse space solver!
>>>
>>> For me, this is a total mystery! Any idea how to explain, debug and
>>>  eventually how to resolve this problem?
>>>
>>> Thomas
>>>
>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121220/e38f33e4/attachment.html>

From Thomas.Witkowski at tu-dresden.de  Thu Dec 20 15:01:29 2012
From: Thomas.Witkowski at tu-dresden.de (Thomas Witkowski)
Date: Thu, 20 Dec 2012 22:01:29 +0100
Subject: [petsc-users] LU factorization and solution of
	independent	matrices does not scale, why?
In-Reply-To: <CA+S8wr9c6sVBTejncFvau_gRe+Z1vLQTJgQuvEWOmo2Wj29Kxg@mail.gmail.com>
References: <50D37234.2040205@tu-dresden.de>
	<4F2AF113-B369-42AD-95B9-3D4C1E8F5CEE@mcs.anl.gov>
	<20121220213950.nyu4ddy1og0kkw8c@mail.zih.tu-dresden.de>
	<CA+S8wr9c6sVBTejncFvau_gRe+Z1vLQTJgQuvEWOmo2Wj29Kxg@mail.gmail.com>
Message-ID: <20121220220129.5f4h5pbq8gwsc0w4@mail.zih.tu-dresden.de>

Jack, I also considered this problem. The 4 MPI tasks of each coarse  
space matrix should run all on one node (each node contains 4 dual  
core CPUs). I'm not 100% sure, but I discussed this with the  
administrators of the system. The system should schedule always the  
first 8 ranks to the first node, and so on. And the coarse space  
matrices are build on ranks 0-3, 4-7 ...

I'm running at the moment some benchmarks, where I replaced the local  
LU factorization from using UMFPACK to MUMPS. Each matrix and the  
corresponding ksp object are defined on PETSC_COMM_SELF and the  
problem is perfectly balanced (the grid is a unit square uniformly  
refined). Lets see...

Thomas

Zitat von Jack Poulson <jack.poulson at gmail.com>:

> Hi Thomas,
>
> Network topology is important. Since most machines are not fully connected,
> random subsets of four processes will become more scattered about the
> cluster as you increase your total number of processes.
>
> Jack
> On Dec 20, 2012 12:39 PM, "Thomas Witkowski" <Thomas.Witkowski at tu-dresden.de>
> wrote:
>
>> I cannot use the information from log_summary, as I have three different
>> LU factorizations and solve (local matrices and two hierarchies of coarse
>> grids). Therefore, I use the following work around to get the timing of the
>> solve I'm intrested in:
>>
>>     MPI::COMM_WORLD.Barrier();
>>     wtime = MPI::Wtime();
>>     KSPSolve(*(data->ksp_schur_**primal_local), tmp_primal, tmp_primal);
>>     FetiTimings::fetiSolve03 += (MPI::Wtime() - wtime);
>>
>> The factorization is done explicitly before with "KSPSetUp", so I can
>> measure the time for LU factorization. It also does not scale! For 64
>> cores, I takes 0.05 seconds, for 1024 cores 1.2 seconds. In all
>> calculations, the local coarse space matrices defined on four cores have
>> exactly the same number of rows and exactly the same number of non zero
>> entries. So, from my point of view, the time should be absolutely constant.
>>
>> Thomas
>>
>> Zitat von Barry Smith <bsmith at mcs.anl.gov>:
>>
>>
>>>   Are you timing ONLY the time to factor and solve the subproblems?  Or
>>> also the time to get the data to the collection of 4 cores at a  time?
>>>
>>>    If you are only using LU for these problems and not elsewhere in  the
>>> code you can get the factorization and time from MatLUFactor()  and
>>> MatSolve() or you can use stages to put this calculation in its  own stage
>>> and use the MatLUFactor() and MatSolve() time from that  stage.
>>> Also look at the load balancing column for the factorization and  solve
>>> stage, it is well balanced?
>>>
>>>    Barry
>>>
>>> On Dec 20, 2012, at 2:16 PM, Thomas Witkowski  <
>>> thomas.witkowski at tu-dresden.**de <thomas.witkowski at tu-dresden.de>> wrote:
>>>
>>>  In my multilevel FETI-DP code, I have localized course matrices,  which
>>>> are defined on only a subset of all MPI tasks, typically  between 4 and 64
>>>> tasks. The MatAIJ and the KSP objects are both  defined on a MPI
>>>> communicator, which is a subset of  MPI::COMM_WORLD. The LU factorization
>>>> of the matrices is computed  with either MUMPS or superlu_dist, but both
>>>> show some scaling  property I really wonder of: When the overall problem
>>>> size is  increased, the solve with the LU factorization of the local
>>>>  matrices does not scale! But why not? I just increase the number  
>>>>  of  local
>>>> matrices, but all of them are independent of each other. Some  example: I
>>>> use 64 cores, each coarse matrix is spanned by 4 cores  so there   
>>>> are 16 MPI
>>>> communicators with 16 coarse space matrices.  The problem need to  
>>>>  solve 192
>>>> times with the coarse space systems,  and this takes together   
>>>> 0.09 seconds.
>>>> Now I increase the number of  cores to 256, but let the local coarse space
>>>> be defined again on  only 4 cores. Again, 192 solutions with these coarse
>>>> spaces are  required, but now this takes 0.24 seconds. The same for 1024
>>>> cores,  and we are at 1.7 seconds for the local coarse space solver!
>>>>
>>>> For me, this is a total mystery! Any idea how to explain, debug and
>>>>  eventually how to resolve this problem?
>>>>
>>>> Thomas
>>>>
>>>
>>>
>>>
>>
>>
>



From jack.poulson at gmail.com  Thu Dec 20 15:07:18 2012
From: jack.poulson at gmail.com (Jack Poulson)
Date: Thu, 20 Dec 2012 15:07:18 -0600
Subject: [petsc-users] LU factorization and solution of independent
 matrices does not scale, why?
In-Reply-To: <20121220220129.5f4h5pbq8gwsc0w4@mail.zih.tu-dresden.de>
References: <50D37234.2040205@tu-dresden.de>
	<4F2AF113-B369-42AD-95B9-3D4C1E8F5CEE@mcs.anl.gov>
	<20121220213950.nyu4ddy1og0kkw8c@mail.zih.tu-dresden.de>
	<CA+S8wr9c6sVBTejncFvau_gRe+Z1vLQTJgQuvEWOmo2Wj29Kxg@mail.gmail.com>
	<20121220220129.5f4h5pbq8gwsc0w4@mail.zih.tu-dresden.de>
Message-ID: <CA+S8wr8S3s0_grghJup8Oi5fM+-i2wqX3QVHOkTie8Uq-SiDEg@mail.gmail.com>

Hi Thomas,

Assuming this is not the issue (it is probably worth explicitly measuring),
it is also important to ensure that the sparsity pattern is preserved, not
just the number of nonzeros per row. A sparse matrix with random nonzero
locations is much more expensive to factor than one with entries near the
diagonal.

Jack
On Dec 20, 2012 1:01 PM, "Thomas Witkowski" <Thomas.Witkowski at tu-dresden.de>
wrote:

> Jack, I also considered this problem. The 4 MPI tasks of each coarse space
> matrix should run all on one node (each node contains 4 dual core CPUs).
> I'm not 100% sure, but I discussed this with the administrators of the
> system. The system should schedule always the first 8 ranks to the first
> node, and so on. And the coarse space matrices are build on ranks 0-3, 4-7
> ...
>
> I'm running at the moment some benchmarks, where I replaced the local LU
> factorization from using UMFPACK to MUMPS. Each matrix and the
> corresponding ksp object are defined on PETSC_COMM_SELF and the problem is
> perfectly balanced (the grid is a unit square uniformly refined). Lets
> see...
>
> Thomas
>
> Zitat von Jack Poulson <jack.poulson at gmail.com>:
>
>  Hi Thomas,
>>
>> Network topology is important. Since most machines are not fully
>> connected,
>> random subsets of four processes will become more scattered about the
>> cluster as you increase your total number of processes.
>>
>> Jack
>> On Dec 20, 2012 12:39 PM, "Thomas Witkowski" <
>> Thomas.Witkowski at tu-dresden.**de <Thomas.Witkowski at tu-dresden.de>>
>> wrote:
>>
>>  I cannot use the information from log_summary, as I have three different
>>> LU factorizations and solve (local matrices and two hierarchies of coarse
>>> grids). Therefore, I use the following work around to get the timing of
>>> the
>>> solve I'm intrested in:
>>>
>>>     MPI::COMM_WORLD.Barrier();
>>>     wtime = MPI::Wtime();
>>>     KSPSolve(*(data->ksp_schur_****primal_local), tmp_primal,
>>> tmp_primal);
>>>     FetiTimings::fetiSolve03 += (MPI::Wtime() - wtime);
>>>
>>> The factorization is done explicitly before with "KSPSetUp", so I can
>>> measure the time for LU factorization. It also does not scale! For 64
>>> cores, I takes 0.05 seconds, for 1024 cores 1.2 seconds. In all
>>> calculations, the local coarse space matrices defined on four cores have
>>> exactly the same number of rows and exactly the same number of non zero
>>> entries. So, from my point of view, the time should be absolutely
>>> constant.
>>>
>>> Thomas
>>>
>>> Zitat von Barry Smith <bsmith at mcs.anl.gov>:
>>>
>>>
>>>    Are you timing ONLY the time to factor and solve the subproblems?  Or
>>>> also the time to get the data to the collection of 4 cores at a  time?
>>>>
>>>>    If you are only using LU for these problems and not elsewhere in  the
>>>> code you can get the factorization and time from MatLUFactor()  and
>>>> MatSolve() or you can use stages to put this calculation in its  own
>>>> stage
>>>> and use the MatLUFactor() and MatSolve() time from that  stage.
>>>> Also look at the load balancing column for the factorization and  solve
>>>> stage, it is well balanced?
>>>>
>>>>    Barry
>>>>
>>>> On Dec 20, 2012, at 2:16 PM, Thomas Witkowski  <
>>>> thomas.witkowski at tu-dresden.****de <thomas.witkowski at tu-dresden.**de<thomas.witkowski at tu-dresden.de>>>
>>>> wrote:
>>>>
>>>>  In my multilevel FETI-DP code, I have localized course matrices,  which
>>>>
>>>>> are defined on only a subset of all MPI tasks, typically  between 4
>>>>> and 64
>>>>> tasks. The MatAIJ and the KSP objects are both  defined on a MPI
>>>>> communicator, which is a subset of  MPI::COMM_WORLD. The LU
>>>>> factorization
>>>>> of the matrices is computed  with either MUMPS or superlu_dist, but
>>>>> both
>>>>> show some scaling  property I really wonder of: When the overall
>>>>> problem
>>>>> size is  increased, the solve with the LU factorization of the local
>>>>>  matrices does not scale! But why not? I just increase the number  of
>>>>>  local
>>>>> matrices, but all of them are independent of each other. Some
>>>>>  example: I
>>>>> use 64 cores, each coarse matrix is spanned by 4 cores  so there  are
>>>>> 16 MPI
>>>>> communicators with 16 coarse space matrices.  The problem need to
>>>>>  solve 192
>>>>> times with the coarse space systems,  and this takes together  0.09
>>>>> seconds.
>>>>> Now I increase the number of  cores to 256, but let the local coarse
>>>>> space
>>>>> be defined again on  only 4 cores. Again, 192 solutions with these
>>>>> coarse
>>>>> spaces are  required, but now this takes 0.24 seconds. The same for
>>>>> 1024
>>>>> cores,  and we are at 1.7 seconds for the local coarse space solver!
>>>>>
>>>>> For me, this is a total mystery! Any idea how to explain, debug and
>>>>>  eventually how to resolve this problem?
>>>>>
>>>>> Thomas
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121220/5e649a84/attachment-0001.html>

From Thomas.Witkowski at tu-dresden.de  Thu Dec 20 15:53:12 2012
From: Thomas.Witkowski at tu-dresden.de (Thomas Witkowski)
Date: Thu, 20 Dec 2012 22:53:12 +0100
Subject: [petsc-users] LU factorization and solution of
	independent	matrices does not scale, why?
In-Reply-To: <20121220220129.5f4h5pbq8gwsc0w4@mail.zih.tu-dresden.de>
References: <50D37234.2040205@tu-dresden.de>
	<4F2AF113-B369-42AD-95B9-3D4C1E8F5CEE@mcs.anl.gov>
	<20121220213950.nyu4ddy1og0kkw8c@mail.zih.tu-dresden.de>
	<CA+S8wr9c6sVBTejncFvau_gRe+Z1vLQTJgQuvEWOmo2Wj29Kxg@mail.gmail.com>
	<20121220220129.5f4h5pbq8gwsc0w4@mail.zih.tu-dresden.de>
Message-ID: <20121220225312.h78wmlgv4g8sggco@mail.zih.tu-dresden.de>

So, I run the benchmark for 16, 64, 256 and 1024 MPI tasks. I replaced  
a local UMFPACK LU factorization of interior matrices and the 192  
solves with them by MUMPS. It all three cases, the size, the structure  
and the values of the matrices are all the same. As expected, with  
UMFPACK both times for factorization and solve are the same for  
different number of cores. The MUMPS, this is not the case:

16 cores: factorization: 3.87 sec   solves: 46 sec
64 cores: factorization: 4.29 sec   solves: 70 sec
256 cores: factorization: 6.11 sec   solves: 254 sec
1024 cores: factorization: 25.64 sec   solves: forever :)


This is really baaad! There is no communication (PETSC_COMM_SELF,  
MatAIJSeq) and all matrices are of the same size. What's going on  
here? May be its possible to reproduce this scenario with one of the  
PETSc examples?

Thomas

Zitat von Thomas Witkowski <Thomas.Witkowski at tu-dresden.de>:

> Jack, I also considered this problem. The 4 MPI tasks of each coarse
> space matrix should run all on one node (each node contains 4 dual core
> CPUs). I'm not 100% sure, but I discussed this with the administrators
> of the system. The system should schedule always the first 8 ranks to
> the first node, and so on. And the coarse space matrices are build on
> ranks 0-3, 4-7 ...
>
> I'm running at the moment some benchmarks, where I replaced the local
> LU factorization from using UMFPACK to MUMPS. Each matrix and the
> corresponding ksp object are defined on PETSC_COMM_SELF and the problem
> is perfectly balanced (the grid is a unit square uniformly refined).
> Lets see...
>
> Thomas
>
> Zitat von Jack Poulson <jack.poulson at gmail.com>:
>
>> Hi Thomas,
>>
>> Network topology is important. Since most machines are not fully connected,
>> random subsets of four processes will become more scattered about the
>> cluster as you increase your total number of processes.
>>
>> Jack
>> On Dec 20, 2012 12:39 PM, "Thomas Witkowski"   
>> <Thomas.Witkowski at tu-dresden.de>
>> wrote:
>>
>>> I cannot use the information from log_summary, as I have three different
>>> LU factorizations and solve (local matrices and two hierarchies of coarse
>>> grids). Therefore, I use the following work around to get the timing of the
>>> solve I'm intrested in:
>>>
>>>    MPI::COMM_WORLD.Barrier();
>>>    wtime = MPI::Wtime();
>>>    KSPSolve(*(data->ksp_schur_**primal_local), tmp_primal, tmp_primal);
>>>    FetiTimings::fetiSolve03 += (MPI::Wtime() - wtime);
>>>
>>> The factorization is done explicitly before with "KSPSetUp", so I can
>>> measure the time for LU factorization. It also does not scale! For 64
>>> cores, I takes 0.05 seconds, for 1024 cores 1.2 seconds. In all
>>> calculations, the local coarse space matrices defined on four cores have
>>> exactly the same number of rows and exactly the same number of non zero
>>> entries. So, from my point of view, the time should be absolutely constant.
>>>
>>> Thomas
>>>
>>> Zitat von Barry Smith <bsmith at mcs.anl.gov>:
>>>
>>>
>>>>  Are you timing ONLY the time to factor and solve the subproblems?  Or
>>>> also the time to get the data to the collection of 4 cores at a  time?
>>>>
>>>>   If you are only using LU for these problems and not elsewhere in  the
>>>> code you can get the factorization and time from MatLUFactor()  and
>>>> MatSolve() or you can use stages to put this calculation in its  own stage
>>>> and use the MatLUFactor() and MatSolve() time from that  stage.
>>>> Also look at the load balancing column for the factorization and  solve
>>>> stage, it is well balanced?
>>>>
>>>>   Barry
>>>>
>>>> On Dec 20, 2012, at 2:16 PM, Thomas Witkowski  <
>>>> thomas.witkowski at tu-dresden.**de <thomas.witkowski at tu-dresden.de>> wrote:
>>>>
>>>> In my multilevel FETI-DP code, I have localized course matrices,  which
>>>>> are defined on only a subset of all MPI tasks, typically    
>>>>> between 4 and 64
>>>>> tasks. The MatAIJ and the KSP objects are both  defined on a MPI
>>>>> communicator, which is a subset of  MPI::COMM_WORLD. The LU factorization
>>>>> of the matrices is computed  with either MUMPS or superlu_dist, but both
>>>>> show some scaling  property I really wonder of: When the overall problem
>>>>> size is  increased, the solve with the LU factorization of the local
>>>>> matrices does not scale! But why not? I just increase the number  
>>>>>   of  local
>>>>> matrices, but all of them are independent of each other. Some  example: I
>>>>> use 64 cores, each coarse matrix is spanned by 4 cores  so there  
>>>>>   are 16 MPI
>>>>> communicators with 16 coarse space matrices.  The problem need   
>>>>> to  solve 192
>>>>> times with the coarse space systems,  and this takes together    
>>>>> 0.09 seconds.
>>>>> Now I increase the number of  cores to 256, but let the local   
>>>>> coarse space
>>>>> be defined again on  only 4 cores. Again, 192 solutions with these coarse
>>>>> spaces are  required, but now this takes 0.24 seconds. The same for 1024
>>>>> cores,  and we are at 1.7 seconds for the local coarse space solver!
>>>>>
>>>>> For me, this is a total mystery! Any idea how to explain, debug and
>>>>> eventually how to resolve this problem?
>>>>>
>>>>> Thomas
>>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>



From knepley at gmail.com  Thu Dec 20 19:19:45 2012
From: knepley at gmail.com (Matthew Knepley)
Date: Thu, 20 Dec 2012 20:19:45 -0500
Subject: [petsc-users] LU factorization and solution of independent
 matrices does not scale, why?
In-Reply-To: <20121220213950.nyu4ddy1og0kkw8c@mail.zih.tu-dresden.de>
References: <50D37234.2040205@tu-dresden.de>
	<4F2AF113-B369-42AD-95B9-3D4C1E8F5CEE@mcs.anl.gov>
	<20121220213950.nyu4ddy1og0kkw8c@mail.zih.tu-dresden.de>
Message-ID: <CAMYG4G=AGJQzdWwT8Q9C4WCm5X9v_e7dOv9G_1F_eUT6sFuLTA@mail.gmail.com>

On Thu, Dec 20, 2012 at 3:39 PM, Thomas Witkowski
<Thomas.Witkowski at tu-dresden.de> wrote:
> I cannot use the information from log_summary, as I have three different LU
> factorizations and solve (local matrices and two hierarchies of coarse
> grids). Therefore, I use the following work around to get the timing of the
> solve I'm intrested in:

You misunderstand how to use logging. You just put these thing in
separate stages. Stages represent
parts of the code over which events are aggregated.

   Matt

>     MPI::COMM_WORLD.Barrier();
>     wtime = MPI::Wtime();
>     KSPSolve(*(data->ksp_schur_primal_local), tmp_primal, tmp_primal);
>     FetiTimings::fetiSolve03 += (MPI::Wtime() - wtime);
>
> The factorization is done explicitly before with "KSPSetUp", so I can
> measure the time for LU factorization. It also does not scale! For 64 cores,
> I takes 0.05 seconds, for 1024 cores 1.2 seconds. In all calculations, the
> local coarse space matrices defined on four cores have exactly the same
> number of rows and exactly the same number of non zero entries. So, from my
> point of view, the time should be absolutely constant.
>
> Thomas
>
> Zitat von Barry Smith <bsmith at mcs.anl.gov>:
>
>
>>
>>   Are you timing ONLY the time to factor and solve the subproblems?  Or
>> also the time to get the data to the collection of 4 cores at a  time?
>>
>>    If you are only using LU for these problems and not elsewhere in  the
>> code you can get the factorization and time from MatLUFactor()  and
>> MatSolve() or you can use stages to put this calculation in its  own stage
>> and use the MatLUFactor() and MatSolve() time from that  stage.
>> Also look at the load balancing column for the factorization and  solve
>> stage, it is well balanced?
>>
>>    Barry
>>
>> On Dec 20, 2012, at 2:16 PM, Thomas Witkowski
>> <thomas.witkowski at tu-dresden.de> wrote:
>>
>>> In my multilevel FETI-DP code, I have localized course matrices,  which
>>> are defined on only a subset of all MPI tasks, typically  between 4 and 64
>>> tasks. The MatAIJ and the KSP objects are both  defined on a MPI
>>> communicator, which is a subset of  MPI::COMM_WORLD. The LU factorization of
>>> the matrices is computed  with either MUMPS or superlu_dist, but both show
>>> some scaling  property I really wonder of: When the overall problem size is
>>> increased, the solve with the LU factorization of the local  matrices does
>>> not scale! But why not? I just increase the number of  local matrices, but
>>> all of them are independent of each other. Some  example: I use 64 cores,
>>> each coarse matrix is spanned by 4 cores  so there are 16 MPI communicators
>>> with 16 coarse space matrices.  The problem need to solve 192 times with the
>>> coarse space systems,  and this takes together 0.09 seconds. Now I increase
>>> the number of  cores to 256, but let the local coarse space be defined again
>>> on  only 4 cores. Again, 192 solutions with these coarse spaces are
>>> required, but now this takes 0.24 seconds. The same for 1024 cores,  and we
>>> are at 1.7 seconds for the local coarse space solver!
>>>
>>> For me, this is a total mystery! Any idea how to explain, debug and
>>> eventually how to resolve this problem?
>>>
>>> Thomas
>>
>>
>>
>
>



--
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener

From aldo.bonfiglioli at unibas.it  Fri Dec 21 03:29:51 2012
From: aldo.bonfiglioli at unibas.it (Aldo Bonfiglioli)
Date: Fri, 21 Dec 2012 10:29:51 +0100
Subject: [petsc-users] PetscKernel_A_gets_inverse_A_
Message-ID: <50D42C0F.2090301@unibas.it>

Dear all,
would it be possible to have a unified interface (also Fortran callable)
to the PetscKernel_A_gets_inverse_A_ routines?
I find them very useful within my own piece
of Fortran code to solve small dense linear system (which I have
to do very frequently).
I have my own interface, at present, but I need to
change it as needed when a new PETSc version is released.

Regards,
Aldo
-- 
Dr. Aldo Bonfiglioli
Associate professor of Fluid Flow Machinery
Scuola di Ingegneria
Universita' della Basilicata
V.le dell'Ateneo lucano, 10 85100 Potenza ITALY
tel:+39.0971.205203 fax:+39.0971.205215


Publications list <http://publicationslist.org/aldo.bonfiglioli>

From thomas.witkowski at tu-dresden.de  Fri Dec 21 03:36:02 2012
From: thomas.witkowski at tu-dresden.de (Thomas Witkowski)
Date: Fri, 21 Dec 2012 10:36:02 +0100
Subject: [petsc-users] LU factorization and solution of independent
 matrices does not scale, why?
In-Reply-To: <CAMYG4G=AGJQzdWwT8Q9C4WCm5X9v_e7dOv9G_1F_eUT6sFuLTA@mail.gmail.com>
References: <50D37234.2040205@tu-dresden.de>
	<4F2AF113-B369-42AD-95B9-3D4C1E8F5CEE@mcs.anl.gov>
	<20121220213950.nyu4ddy1og0kkw8c@mail.zih.tu-dresden.de>
	<CAMYG4G=AGJQzdWwT8Q9C4WCm5X9v_e7dOv9G_1F_eUT6sFuLTA@mail.gmail.com>
Message-ID: <50D42D82.10603@tu-dresden.de>

Okay, I did a similar benchmark now with PETSc's event logging:

UMFPACK
  16p: Local solve          350 1.0 2.3025e+01 1.1 5.00e+04 1.0 0.0e+00 
0.0e+00 7.0e+02 63  0  0  0 52  63  0  0  0 51     0
  64p: Local solve          350 1.0 2.3208e+01 1.1 5.00e+04 1.0 0.0e+00 
0.0e+00 7.0e+02 60  0  0  0 52  60  0  0  0 51     0
256p: Local solve          350 1.0 2.3373e+01 1.1 5.00e+04 1.0 0.0e+00 
0.0e+00 7.0e+02 49  0  0  0 52  49  0  0  0 51     1

MUMPS
  16p: Local solve          350 1.0 4.7183e+01 1.1 5.00e+04 1.0 0.0e+00 
0.0e+00 7.0e+02 75  0  0  0 52  75  0  0  0 51     0
  64p: Local solve          350 1.0 7.1409e+01 1.1 5.00e+04 1.0 0.0e+00 
0.0e+00 7.0e+02 78  0  0  0 52  78  0  0  0 51     0
256p: Local solve          350 1.0 2.6079e+02 1.1 5.00e+04 1.0 0.0e+00 
0.0e+00 7.0e+02 82  0  0  0 52  82  0  0  0 51     0


As you see, the local solves with UMFPACK have nearly constant time with 
increasing number of subdomains. This is what I expect. The I replace 
UMFPACK by MUMPS and I see increasing time for local solves. In the last 
columns, UMFPACK has a decreasing value from 63 to 49, while MUMPS's 
column increases here from 75 to 82. What does this mean?

Thomas

Am 21.12.2012 02:19, schrieb Matthew Knepley:
> On Thu, Dec 20, 2012 at 3:39 PM, Thomas Witkowski
> <Thomas.Witkowski at tu-dresden.de> wrote:
>> I cannot use the information from log_summary, as I have three different LU
>> factorizations and solve (local matrices and two hierarchies of coarse
>> grids). Therefore, I use the following work around to get the timing of the
>> solve I'm intrested in:
> You misunderstand how to use logging. You just put these thing in
> separate stages. Stages represent
> parts of the code over which events are aggregated.
>
>     Matt
>
>>      MPI::COMM_WORLD.Barrier();
>>      wtime = MPI::Wtime();
>>      KSPSolve(*(data->ksp_schur_primal_local), tmp_primal, tmp_primal);
>>      FetiTimings::fetiSolve03 += (MPI::Wtime() - wtime);
>>
>> The factorization is done explicitly before with "KSPSetUp", so I can
>> measure the time for LU factorization. It also does not scale! For 64 cores,
>> I takes 0.05 seconds, for 1024 cores 1.2 seconds. In all calculations, the
>> local coarse space matrices defined on four cores have exactly the same
>> number of rows and exactly the same number of non zero entries. So, from my
>> point of view, the time should be absolutely constant.
>>
>> Thomas
>>
>> Zitat von Barry Smith <bsmith at mcs.anl.gov>:
>>
>>
>>>    Are you timing ONLY the time to factor and solve the subproblems?  Or
>>> also the time to get the data to the collection of 4 cores at a  time?
>>>
>>>     If you are only using LU for these problems and not elsewhere in  the
>>> code you can get the factorization and time from MatLUFactor()  and
>>> MatSolve() or you can use stages to put this calculation in its  own stage
>>> and use the MatLUFactor() and MatSolve() time from that  stage.
>>> Also look at the load balancing column for the factorization and  solve
>>> stage, it is well balanced?
>>>
>>>     Barry
>>>
>>> On Dec 20, 2012, at 2:16 PM, Thomas Witkowski
>>> <thomas.witkowski at tu-dresden.de> wrote:
>>>
>>>> In my multilevel FETI-DP code, I have localized course matrices,  which
>>>> are defined on only a subset of all MPI tasks, typically  between 4 and 64
>>>> tasks. The MatAIJ and the KSP objects are both  defined on a MPI
>>>> communicator, which is a subset of  MPI::COMM_WORLD. The LU factorization of
>>>> the matrices is computed  with either MUMPS or superlu_dist, but both show
>>>> some scaling  property I really wonder of: When the overall problem size is
>>>> increased, the solve with the LU factorization of the local  matrices does
>>>> not scale! But why not? I just increase the number of  local matrices, but
>>>> all of them are independent of each other. Some  example: I use 64 cores,
>>>> each coarse matrix is spanned by 4 cores  so there are 16 MPI communicators
>>>> with 16 coarse space matrices.  The problem need to solve 192 times with the
>>>> coarse space systems,  and this takes together 0.09 seconds. Now I increase
>>>> the number of  cores to 256, but let the local coarse space be defined again
>>>> on  only 4 cores. Again, 192 solutions with these coarse spaces are
>>>> required, but now this takes 0.24 seconds. The same for 1024 cores,  and we
>>>> are at 1.7 seconds for the local coarse space solver!
>>>>
>>>> For me, this is a total mystery! Any idea how to explain, debug and
>>>> eventually how to resolve this problem?
>>>>
>>>> Thomas
>>>
>>>
>>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which
> their experiments lead.
> -- Norbert Wiener


From aldo.bonfiglioli at unibas.it  Fri Dec 21 06:04:31 2012
From: aldo.bonfiglioli at unibas.it (Aldo Bonfiglioli)
Date: Fri, 21 Dec 2012 13:04:31 +0100
Subject: [petsc-users] VecSetBlockSize with release 3.3
Message-ID: <50D4504F.5010105@unibas.it>

Dear all,
I am in the process of upgrading from 3.2  to 3.3.

I am a little bit puzzled by the following change:
> VecSetBlockSize() cannot be called after VecCreateSeq() or
> VecCreateMPI() and must be called before VecSetUp() or
> VecSetFromOptions() or before either VecSetType() or VecSetSizes()
With the earlier release I used to do the following:

         CALL VecCreateSeq(PETSC_COMM_SELF,NPOIN*NOFVAR,DT,IFAIL)
C
C
          IF(NOFVAR.GT.1) CALL VecSetBlockSize(DT,NOFVAR,IFAIL)

with 3.3 it looks like the following is required :


          CALL VecCreate(PETSC_COMM_SELF,DT,IFAIL)
          CALL VecSetType(DT,VECSEQ,IFAIL)
          CALL VecSetBlockSize(DT,NOFVAR,IFAIL)
          CALL VecSetSizes(DT,NPOIN*NOFVAR,PETSC_DECIDE,IFAIL)

Is there a simpler (i.e. less library calls) way to achieve the same result?

Regards,
Aldo
-- 
Dr. Aldo Bonfiglioli
Associate professor of Fluid Flow Machinery
Scuola di Ingegneria
Universita' della Basilicata
V.le dell'Ateneo lucano, 10 85100 Potenza ITALY
tel:+39.0971.205203 fax:+39.0971.205215


Publications list <http://publicationslist.org/aldo.bonfiglioli>

From knepley at gmail.com  Fri Dec 21 06:52:04 2012
From: knepley at gmail.com (Matthew Knepley)
Date: Fri, 21 Dec 2012 07:52:04 -0500
Subject: [petsc-users] VecSetBlockSize with release 3.3
In-Reply-To: <50D4504F.5010105@unibas.it>
References: <50D4504F.5010105@unibas.it>
Message-ID: <CAMYG4GkDon-MKCmhc2LjGYsS+ZUGBtEN_fN3HT6Dfc_GiTvzXg@mail.gmail.com>

On Fri, Dec 21, 2012 at 7:04 AM, Aldo Bonfiglioli
<aldo.bonfiglioli at unibas.it> wrote:
> Dear all,
> I am in the process of upgrading from 3.2  to 3.3.
>
> I am a little bit puzzled by the following change:
>> VecSetBlockSize() cannot be called after VecCreateSeq() or
>> VecCreateMPI() and must be called before VecSetUp() or
>> VecSetFromOptions() or before either VecSetType() or VecSetSizes()
> With the earlier release I used to do the following:
>
>          CALL VecCreateSeq(PETSC_COMM_SELF,NPOIN*NOFVAR,DT,IFAIL)
> C
> C
>           IF(NOFVAR.GT.1) CALL VecSetBlockSize(DT,NOFVAR,IFAIL)
>
> with 3.3 it looks like the following is required :
>
>
>           CALL VecCreate(PETSC_COMM_SELF,DT,IFAIL)
>           CALL VecSetType(DT,VECSEQ,IFAIL)
>           CALL VecSetBlockSize(DT,NOFVAR,IFAIL)
>           CALL VecSetSizes(DT,NPOIN*NOFVAR,PETSC_DECIDE,IFAIL)
>
> Is there a simpler (i.e. less library calls) way to achieve the same result?

No, there is a complicated set of dependencies here for setup. We discussed this
and could not find an easier way to do it.

Personally, I would never call SetType() in my code, only
VecSetFromOptions(). Also,
we call thee functions very rarely, since we almost always use VecDuplicate(),
DMGetGlobal/LocalVector(), etc.

   Matt

> Regards,
> Aldo
> --
> Dr. Aldo Bonfiglioli
> Associate professor of Fluid Flow Machinery
> Scuola di Ingegneria
> Universita' della Basilicata
> V.le dell'Ateneo lucano, 10 85100 Potenza ITALY
> tel:+39.0971.205203 fax:+39.0971.205215
>
>
> Publications list <http://publicationslist.org/aldo.bonfiglioli>



--
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener

From jedbrown at mcs.anl.gov  Fri Dec 21 08:08:14 2012
From: jedbrown at mcs.anl.gov (Jed Brown)
Date: Fri, 21 Dec 2012 07:08:14 -0700
Subject: [petsc-users] LU factorization and solution of independent
 matrices does not scale, why?
In-Reply-To: <50D42D82.10603@tu-dresden.de>
References: <50D37234.2040205@tu-dresden.de>
	<4F2AF113-B369-42AD-95B9-3D4C1E8F5CEE@mcs.anl.gov>
	<20121220213950.nyu4ddy1og0kkw8c@mail.zih.tu-dresden.de>
	<CAMYG4G=AGJQzdWwT8Q9C4WCm5X9v_e7dOv9G_1F_eUT6sFuLTA@mail.gmail.com>
	<50D42D82.10603@tu-dresden.de>
Message-ID: <CAM9tzSk9r-rUSk_0f_vHzNBsTnD9TmR1=58N4XtA6HRdXRQvMg@mail.gmail.com>

MUMPS uses MPI_Iprobe on MPI_COMM_WORLD (hard-coded). What MPI
implementation have you been using? Is the behavior different with a
different implementation?


On Fri, Dec 21, 2012 at 2:36 AM, Thomas Witkowski <
thomas.witkowski at tu-dresden.de> wrote:

> Okay, I did a similar benchmark now with PETSc's event logging:
>
> UMFPACK
>  16p: Local solve          350 1.0 2.3025e+01 1.1 5.00e+04 1.0 0.0e+00
> 0.0e+00 7.0e+02 63  0  0  0 52  63  0  0  0 51     0
>  64p: Local solve          350 1.0 2.3208e+01 1.1 5.00e+04 1.0 0.0e+00
> 0.0e+00 7.0e+02 60  0  0  0 52  60  0  0  0 51     0
> 256p: Local solve          350 1.0 2.3373e+01 1.1 5.00e+04 1.0 0.0e+00
> 0.0e+00 7.0e+02 49  0  0  0 52  49  0  0  0 51     1
>
> MUMPS
>  16p: Local solve          350 1.0 4.7183e+01 1.1 5.00e+04 1.0 0.0e+00
> 0.0e+00 7.0e+02 75  0  0  0 52  75  0  0  0 51     0
>  64p: Local solve          350 1.0 7.1409e+01 1.1 5.00e+04 1.0 0.0e+00
> 0.0e+00 7.0e+02 78  0  0  0 52  78  0  0  0 51     0
> 256p: Local solve          350 1.0 2.6079e+02 1.1 5.00e+04 1.0 0.0e+00
> 0.0e+00 7.0e+02 82  0  0  0 52  82  0  0  0 51     0
>
>
> As you see, the local solves with UMFPACK have nearly constant time with
> increasing number of subdomains. This is what I expect. The I replace
> UMFPACK by MUMPS and I see increasing time for local solves. In the last
> columns, UMFPACK has a decreasing value from 63 to 49, while MUMPS's column
> increases here from 75 to 82. What does this mean?
>
> Thomas
>
> Am 21.12.2012 02:19, schrieb Matthew Knepley:
>
>  On Thu, Dec 20, 2012 at 3:39 PM, Thomas Witkowski
>> <Thomas.Witkowski at tu-dresden.**de <Thomas.Witkowski at tu-dresden.de>>
>> wrote:
>>
>>> I cannot use the information from log_summary, as I have three different
>>> LU
>>> factorizations and solve (local matrices and two hierarchies of coarse
>>> grids). Therefore, I use the following work around to get the timing of
>>> the
>>> solve I'm intrested in:
>>>
>> You misunderstand how to use logging. You just put these thing in
>> separate stages. Stages represent
>> parts of the code over which events are aggregated.
>>
>>     Matt
>>
>>       MPI::COMM_WORLD.Barrier();
>>>      wtime = MPI::Wtime();
>>>      KSPSolve(*(data->ksp_schur_**primal_local), tmp_primal,
>>> tmp_primal);
>>>      FetiTimings::fetiSolve03 += (MPI::Wtime() - wtime);
>>>
>>> The factorization is done explicitly before with "KSPSetUp", so I can
>>> measure the time for LU factorization. It also does not scale! For 64
>>> cores,
>>> I takes 0.05 seconds, for 1024 cores 1.2 seconds. In all calculations,
>>> the
>>> local coarse space matrices defined on four cores have exactly the same
>>> number of rows and exactly the same number of non zero entries. So, from
>>> my
>>> point of view, the time should be absolutely constant.
>>>
>>> Thomas
>>>
>>> Zitat von Barry Smith <bsmith at mcs.anl.gov>:
>>>
>>>
>>>     Are you timing ONLY the time to factor and solve the subproblems?  Or
>>>> also the time to get the data to the collection of 4 cores at a  time?
>>>>
>>>>     If you are only using LU for these problems and not elsewhere in
>>>>  the
>>>> code you can get the factorization and time from MatLUFactor()  and
>>>> MatSolve() or you can use stages to put this calculation in its  own
>>>> stage
>>>> and use the MatLUFactor() and MatSolve() time from that  stage.
>>>> Also look at the load balancing column for the factorization and  solve
>>>> stage, it is well balanced?
>>>>
>>>>     Barry
>>>>
>>>> On Dec 20, 2012, at 2:16 PM, Thomas Witkowski
>>>> <thomas.witkowski at tu-dresden.**de <thomas.witkowski at tu-dresden.de>>
>>>> wrote:
>>>>
>>>>  In my multilevel FETI-DP code, I have localized course matrices,  which
>>>>> are defined on only a subset of all MPI tasks, typically  between 4
>>>>> and 64
>>>>> tasks. The MatAIJ and the KSP objects are both  defined on a MPI
>>>>> communicator, which is a subset of  MPI::COMM_WORLD. The LU
>>>>> factorization of
>>>>> the matrices is computed  with either MUMPS or superlu_dist, but both
>>>>> show
>>>>> some scaling  property I really wonder of: When the overall problem
>>>>> size is
>>>>> increased, the solve with the LU factorization of the local  matrices
>>>>> does
>>>>> not scale! But why not? I just increase the number of  local matrices,
>>>>> but
>>>>> all of them are independent of each other. Some  example: I use 64
>>>>> cores,
>>>>> each coarse matrix is spanned by 4 cores  so there are 16 MPI
>>>>> communicators
>>>>> with 16 coarse space matrices.  The problem need to solve 192 times
>>>>> with the
>>>>> coarse space systems,  and this takes together 0.09 seconds. Now I
>>>>> increase
>>>>> the number of  cores to 256, but let the local coarse space be defined
>>>>> again
>>>>> on  only 4 cores. Again, 192 solutions with these coarse spaces are
>>>>> required, but now this takes 0.24 seconds. The same for 1024 cores,
>>>>>  and we
>>>>> are at 1.7 seconds for the local coarse space solver!
>>>>>
>>>>> For me, this is a total mystery! Any idea how to explain, debug and
>>>>> eventually how to resolve this problem?
>>>>>
>>>>> Thomas
>>>>>
>>>>
>>>>
>>>>
>>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which
>> their experiments lead.
>> -- Norbert Wiener
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121221/8968e584/attachment-0001.html>

From Thomas.Witkowski at tu-dresden.de  Fri Dec 21 09:51:12 2012
From: Thomas.Witkowski at tu-dresden.de (Thomas Witkowski)
Date: Fri, 21 Dec 2012 16:51:12 +0100
Subject: [petsc-users] LU factorization and solution of
	independent	matrices does not scale, why?
In-Reply-To: <CAM9tzSk9r-rUSk_0f_vHzNBsTnD9TmR1=58N4XtA6HRdXRQvMg@mail.gmail.com>
References: <50D37234.2040205@tu-dresden.de>
	<4F2AF113-B369-42AD-95B9-3D4C1E8F5CEE@mcs.anl.gov>
	<20121220213950.nyu4ddy1og0kkw8c@mail.zih.tu-dresden.de>
	<CAMYG4G=AGJQzdWwT8Q9C4WCm5X9v_e7dOv9G_1F_eUT6sFuLTA@mail.gmail.com>
	<50D42D82.10603@tu-dresden.de>
	<CAM9tzSk9r-rUSk_0f_vHzNBsTnD9TmR1=58N4XtA6HRdXRQvMg@mail.gmail.com>
Message-ID: <20121221165112.h5x9cere68sgc488@mail.zih.tu-dresden.de>

I use a modified MPICH version. On the system I use for these  
benchmarks I cannot use another MPI library.

I'm not fixed to MUMPS. Superlu_dist, for example, works also  
perfectly for this. But there is still the following problem I cannot  
solve: When I increase the number of coarse space matrices, there  
seems to be no scaling direct solver for this. Just to summaries:
- one coarse space matrix is created always by one "cluster"  
consisting of four subdomanins/MPI tasks
- the four tasks are always local to one node, thus inter-node network  
communication is not required for computing factorization and solve
- independent of the number of cluster, the coarse space matrices are  
the same, have the same number of rows, nnz structure but possibly  
different values
- there is NO load unbalancing
- the matrices must be factorized and there are a lot of solves (>  
100) with them

It should be pretty clear, that computing LU factorization and solving  
with it should scale perfectly. But at the moment, all direct solver I  
tried (mumps, superlu_dist, pastix) are not able to scale. The loos of  
scale is really worse, as you can see from the numbers I send before.

Any ideas? Suggestions? Without a scaling solver method for these kind  
of systems, my multilevel FETI-DP code is just more or less a joke,  
only some orders of magnitude slower than standard FETI-DP method :)

Thomas

Zitat von Jed Brown <jedbrown at mcs.anl.gov>:

> MUMPS uses MPI_Iprobe on MPI_COMM_WORLD (hard-coded). What MPI
> implementation have you been using? Is the behavior different with a
> different implementation?
>
>
> On Fri, Dec 21, 2012 at 2:36 AM, Thomas Witkowski <
> thomas.witkowski at tu-dresden.de> wrote:
>
>> Okay, I did a similar benchmark now with PETSc's event logging:
>>
>> UMFPACK
>>  16p: Local solve          350 1.0 2.3025e+01 1.1 5.00e+04 1.0 0.0e+00
>> 0.0e+00 7.0e+02 63  0  0  0 52  63  0  0  0 51     0
>>  64p: Local solve          350 1.0 2.3208e+01 1.1 5.00e+04 1.0 0.0e+00
>> 0.0e+00 7.0e+02 60  0  0  0 52  60  0  0  0 51     0
>> 256p: Local solve          350 1.0 2.3373e+01 1.1 5.00e+04 1.0 0.0e+00
>> 0.0e+00 7.0e+02 49  0  0  0 52  49  0  0  0 51     1
>>
>> MUMPS
>>  16p: Local solve          350 1.0 4.7183e+01 1.1 5.00e+04 1.0 0.0e+00
>> 0.0e+00 7.0e+02 75  0  0  0 52  75  0  0  0 51     0
>>  64p: Local solve          350 1.0 7.1409e+01 1.1 5.00e+04 1.0 0.0e+00
>> 0.0e+00 7.0e+02 78  0  0  0 52  78  0  0  0 51     0
>> 256p: Local solve          350 1.0 2.6079e+02 1.1 5.00e+04 1.0 0.0e+00
>> 0.0e+00 7.0e+02 82  0  0  0 52  82  0  0  0 51     0
>>
>>
>> As you see, the local solves with UMFPACK have nearly constant time with
>> increasing number of subdomains. This is what I expect. The I replace
>> UMFPACK by MUMPS and I see increasing time for local solves. In the last
>> columns, UMFPACK has a decreasing value from 63 to 49, while MUMPS's column
>> increases here from 75 to 82. What does this mean?
>>
>> Thomas
>>
>> Am 21.12.2012 02:19, schrieb Matthew Knepley:
>>
>>  On Thu, Dec 20, 2012 at 3:39 PM, Thomas Witkowski
>>> <Thomas.Witkowski at tu-dresden.**de <Thomas.Witkowski at tu-dresden.de>>
>>> wrote:
>>>
>>>> I cannot use the information from log_summary, as I have three different
>>>> LU
>>>> factorizations and solve (local matrices and two hierarchies of coarse
>>>> grids). Therefore, I use the following work around to get the timing of
>>>> the
>>>> solve I'm intrested in:
>>>>
>>> You misunderstand how to use logging. You just put these thing in
>>> separate stages. Stages represent
>>> parts of the code over which events are aggregated.
>>>
>>>     Matt
>>>
>>>       MPI::COMM_WORLD.Barrier();
>>>>      wtime = MPI::Wtime();
>>>>      KSPSolve(*(data->ksp_schur_**primal_local), tmp_primal,
>>>> tmp_primal);
>>>>      FetiTimings::fetiSolve03 += (MPI::Wtime() - wtime);
>>>>
>>>> The factorization is done explicitly before with "KSPSetUp", so I can
>>>> measure the time for LU factorization. It also does not scale! For 64
>>>> cores,
>>>> I takes 0.05 seconds, for 1024 cores 1.2 seconds. In all calculations,
>>>> the
>>>> local coarse space matrices defined on four cores have exactly the same
>>>> number of rows and exactly the same number of non zero entries. So, from
>>>> my
>>>> point of view, the time should be absolutely constant.
>>>>
>>>> Thomas
>>>>
>>>> Zitat von Barry Smith <bsmith at mcs.anl.gov>:
>>>>
>>>>
>>>>     Are you timing ONLY the time to factor and solve the subproblems?  Or
>>>>> also the time to get the data to the collection of 4 cores at a  time?
>>>>>
>>>>>     If you are only using LU for these problems and not elsewhere in
>>>>>  the
>>>>> code you can get the factorization and time from MatLUFactor()  and
>>>>> MatSolve() or you can use stages to put this calculation in its  own
>>>>> stage
>>>>> and use the MatLUFactor() and MatSolve() time from that  stage.
>>>>> Also look at the load balancing column for the factorization and  solve
>>>>> stage, it is well balanced?
>>>>>
>>>>>     Barry
>>>>>
>>>>> On Dec 20, 2012, at 2:16 PM, Thomas Witkowski
>>>>> <thomas.witkowski at tu-dresden.**de <thomas.witkowski at tu-dresden.de>>
>>>>> wrote:
>>>>>
>>>>>  In my multilevel FETI-DP code, I have localized course matrices,  which
>>>>>> are defined on only a subset of all MPI tasks, typically  between 4
>>>>>> and 64
>>>>>> tasks. The MatAIJ and the KSP objects are both  defined on a MPI
>>>>>> communicator, which is a subset of  MPI::COMM_WORLD. The LU
>>>>>> factorization of
>>>>>> the matrices is computed  with either MUMPS or superlu_dist, but both
>>>>>> show
>>>>>> some scaling  property I really wonder of: When the overall problem
>>>>>> size is
>>>>>> increased, the solve with the LU factorization of the local  matrices
>>>>>> does
>>>>>> not scale! But why not? I just increase the number of  local matrices,
>>>>>> but
>>>>>> all of them are independent of each other. Some  example: I use 64
>>>>>> cores,
>>>>>> each coarse matrix is spanned by 4 cores  so there are 16 MPI
>>>>>> communicators
>>>>>> with 16 coarse space matrices.  The problem need to solve 192 times
>>>>>> with the
>>>>>> coarse space systems,  and this takes together 0.09 seconds. Now I
>>>>>> increase
>>>>>> the number of  cores to 256, but let the local coarse space be defined
>>>>>> again
>>>>>> on  only 4 cores. Again, 192 solutions with these coarse spaces are
>>>>>> required, but now this takes 0.24 seconds. The same for 1024 cores,
>>>>>>  and we
>>>>>> are at 1.7 seconds for the local coarse space solver!
>>>>>>
>>>>>> For me, this is a total mystery! Any idea how to explain, debug and
>>>>>> eventually how to resolve this problem?
>>>>>>
>>>>>> Thomas
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>> --
>>> What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which
>>> their experiments lead.
>>> -- Norbert Wiener
>>>
>>
>>
>



From agrayver at gfz-potsdam.de  Fri Dec 21 10:00:10 2012
From: agrayver at gfz-potsdam.de (Alexander Grayver)
Date: Fri, 21 Dec 2012 17:00:10 +0100
Subject: [petsc-users] LU factorization and solution of
 independent	matrices does not scale, why?
In-Reply-To: <20121221165112.h5x9cere68sgc488@mail.zih.tu-dresden.de>
References: <50D37234.2040205@tu-dresden.de>
	<4F2AF113-B369-42AD-95B9-3D4C1E8F5CEE@mcs.anl.gov>
	<20121220213950.nyu4ddy1og0kkw8c@mail.zih.tu-dresden.de>
	<CAMYG4G=AGJQzdWwT8Q9C4WCm5X9v_e7dOv9G_1F_eUT6sFuLTA@mail.gmail.com>
	<50D42D82.10603@tu-dresden.de>
	<CAM9tzSk9r-rUSk_0f_vHzNBsTnD9TmR1=58N4XtA6HRdXRQvMg@mail.gmail.com>
	<20121221165112.h5x9cere68sgc488@mail.zih.tu-dresden.de>
Message-ID: <50D4878A.9080004@gfz-potsdam.de>

Thomas,

I'm missing one point...
You run N sequential factorizations (i.e. each has its own matrix to 
work with and no need to communicate?) independently within ONE node? Or 
there are N factorizations that run on N nodes?

Jed,

 > MUMPS uses MPI_Iprobe on MPI_COMM_WORLD (hard-coded).

Any reason they do it that way? Which part of the code is that (i.e. 
analysis/factorization/solution.)?

Regards,
Alexander

On 21.12.2012 16:51, Thomas Witkowski wrote:
> I use a modified MPICH version. On the system I use for these 
> benchmarks I cannot use another MPI library.
>
> I'm not fixed to MUMPS. Superlu_dist, for example, works also 
> perfectly for this. But there is still the following problem I cannot 
> solve: When I increase the number of coarse space matrices, there 
> seems to be no scaling direct solver for this. Just to summaries:
> - one coarse space matrix is created always by one "cluster" 
> consisting of four subdomanins/MPI tasks
> - the four tasks are always local to one node, thus inter-node network 
> communication is not required for computing factorization and solve
> - independent of the number of cluster, the coarse space matrices are 
> the same, have the same number of rows, nnz structure but possibly 
> different values
> - there is NO load unbalancing
> - the matrices must be factorized and there are a lot of solves (> 
> 100) with them
>
> It should be pretty clear, that computing LU factorization and solving 
> with it should scale perfectly. But at the moment, all direct solver I 
> tried (mumps, superlu_dist, pastix) are not able to scale. The loos of 
> scale is really worse, as you can see from the numbers I send before.
>
> Any ideas? Suggestions? Without a scaling solver method for these kind 
> of systems, my multilevel FETI-DP code is just more or less a joke, 
> only some orders of magnitude slower than standard FETI-DP method :)
>
> Thomas

From knepley at gmail.com  Fri Dec 21 10:00:21 2012
From: knepley at gmail.com (Matthew Knepley)
Date: Fri, 21 Dec 2012 11:00:21 -0500
Subject: [petsc-users] LU factorization and solution of independent
 matrices does not scale, why?
In-Reply-To: <20121221165112.h5x9cere68sgc488@mail.zih.tu-dresden.de>
References: <50D37234.2040205@tu-dresden.de>
	<4F2AF113-B369-42AD-95B9-3D4C1E8F5CEE@mcs.anl.gov>
	<20121220213950.nyu4ddy1og0kkw8c@mail.zih.tu-dresden.de>
	<CAMYG4G=AGJQzdWwT8Q9C4WCm5X9v_e7dOv9G_1F_eUT6sFuLTA@mail.gmail.com>
	<50D42D82.10603@tu-dresden.de>
	<CAM9tzSk9r-rUSk_0f_vHzNBsTnD9TmR1=58N4XtA6HRdXRQvMg@mail.gmail.com>
	<20121221165112.h5x9cere68sgc488@mail.zih.tu-dresden.de>
Message-ID: <CAMYG4G=v02NAR5hY0aqyB4QLjWzEOypF6Q4ZmeDq1c5SgZ2bag@mail.gmail.com>

On Fri, Dec 21, 2012 at 10:51 AM, Thomas Witkowski
<Thomas.Witkowski at tu-dresden.de> wrote:
> I use a modified MPICH version. On the system I use for these benchmarks I
> cannot use another MPI library.
>
> I'm not fixed to MUMPS. Superlu_dist, for example, works also perfectly for
> this. But there is still the following problem I cannot solve: When I
> increase the number of coarse space matrices, there seems to be no scaling
> direct solver for this. Just to summaries:
> - one coarse space matrix is created always by one "cluster" consisting of
> four subdomanins/MPI tasks
> - the four tasks are always local to one node, thus inter-node network
> communication is not required for computing factorization and solve
> - independent of the number of cluster, the coarse space matrices are the
> same, have the same number of rows, nnz structure but possibly different
> values
> - there is NO load unbalancing
> - the matrices must be factorized and there are a lot of solves (> 100) with
> them

So the numbers you have below for UMFPACK are using one matrix per MPI rank
instead of one matrix per 4 ranks?

There seem to be two obvious sources of bugs:

  1) Your parallel solver is not just using the comm with 4 ranks

  2) These ranks are not clustered together on one node for that comm

   Matt

> It should be pretty clear, that computing LU factorization and solving with
> it should scale perfectly. But at the moment, all direct solver I tried
> (mumps, superlu_dist, pastix) are not able to scale. The loos of scale is
> really worse, as you can see from the numbers I send before.
>
> Any ideas? Suggestions? Without a scaling solver method for these kind of
> systems, my multilevel FETI-DP code is just more or less a joke, only some
> orders of magnitude slower than standard FETI-DP method :)
>
> Thomas
>
> Zitat von Jed Brown <jedbrown at mcs.anl.gov>:
>
>> MUMPS uses MPI_Iprobe on MPI_COMM_WORLD (hard-coded). What MPI
>> implementation have you been using? Is the behavior different with a
>> different implementation?
>>
>>
>> On Fri, Dec 21, 2012 at 2:36 AM, Thomas Witkowski <
>> thomas.witkowski at tu-dresden.de> wrote:
>>
>>> Okay, I did a similar benchmark now with PETSc's event logging:
>>>
>>> UMFPACK
>>>  16p: Local solve          350 1.0 2.3025e+01 1.1 5.00e+04 1.0 0.0e+00
>>> 0.0e+00 7.0e+02 63  0  0  0 52  63  0  0  0 51     0
>>>  64p: Local solve          350 1.0 2.3208e+01 1.1 5.00e+04 1.0 0.0e+00
>>> 0.0e+00 7.0e+02 60  0  0  0 52  60  0  0  0 51     0
>>> 256p: Local solve          350 1.0 2.3373e+01 1.1 5.00e+04 1.0 0.0e+00
>>> 0.0e+00 7.0e+02 49  0  0  0 52  49  0  0  0 51     1
>>>
>>> MUMPS
>>>  16p: Local solve          350 1.0 4.7183e+01 1.1 5.00e+04 1.0 0.0e+00
>>> 0.0e+00 7.0e+02 75  0  0  0 52  75  0  0  0 51     0
>>>  64p: Local solve          350 1.0 7.1409e+01 1.1 5.00e+04 1.0 0.0e+00
>>> 0.0e+00 7.0e+02 78  0  0  0 52  78  0  0  0 51     0
>>> 256p: Local solve          350 1.0 2.6079e+02 1.1 5.00e+04 1.0 0.0e+00
>>> 0.0e+00 7.0e+02 82  0  0  0 52  82  0  0  0 51     0
>>>
>>>
>>> As you see, the local solves with UMFPACK have nearly constant time with
>>> increasing number of subdomains. This is what I expect. The I replace
>>> UMFPACK by MUMPS and I see increasing time for local solves. In the last
>>> columns, UMFPACK has a decreasing value from 63 to 49, while MUMPS's
>>> column
>>> increases here from 75 to 82. What does this mean?
>>>
>>> Thomas
>>>
>>> Am 21.12.2012 02:19, schrieb Matthew Knepley:
>>>
>>>  On Thu, Dec 20, 2012 at 3:39 PM, Thomas Witkowski
>>>>
>>>> <Thomas.Witkowski at tu-dresden.**de <Thomas.Witkowski at tu-dresden.de>>
>>>>
>>>> wrote:
>>>>
>>>>> I cannot use the information from log_summary, as I have three
>>>>> different
>>>>> LU
>>>>> factorizations and solve (local matrices and two hierarchies of coarse
>>>>> grids). Therefore, I use the following work around to get the timing of
>>>>> the
>>>>> solve I'm intrested in:
>>>>>
>>>> You misunderstand how to use logging. You just put these thing in
>>>> separate stages. Stages represent
>>>> parts of the code over which events are aggregated.
>>>>
>>>>     Matt
>>>>
>>>>       MPI::COMM_WORLD.Barrier();
>>>>>
>>>>>      wtime = MPI::Wtime();
>>>>>      KSPSolve(*(data->ksp_schur_**primal_local), tmp_primal,
>>>>>
>>>>> tmp_primal);
>>>>>      FetiTimings::fetiSolve03 += (MPI::Wtime() - wtime);
>>>>>
>>>>> The factorization is done explicitly before with "KSPSetUp", so I can
>>>>> measure the time for LU factorization. It also does not scale! For 64
>>>>> cores,
>>>>> I takes 0.05 seconds, for 1024 cores 1.2 seconds. In all calculations,
>>>>> the
>>>>> local coarse space matrices defined on four cores have exactly the same
>>>>> number of rows and exactly the same number of non zero entries. So,
>>>>> from
>>>>> my
>>>>> point of view, the time should be absolutely constant.
>>>>>
>>>>> Thomas
>>>>>
>>>>> Zitat von Barry Smith <bsmith at mcs.anl.gov>:
>>>>>
>>>>>
>>>>>     Are you timing ONLY the time to factor and solve the subproblems?
>>>>> Or
>>>>>>
>>>>>> also the time to get the data to the collection of 4 cores at a  time?
>>>>>>
>>>>>>     If you are only using LU for these problems and not elsewhere in
>>>>>>  the
>>>>>> code you can get the factorization and time from MatLUFactor()  and
>>>>>> MatSolve() or you can use stages to put this calculation in its  own
>>>>>> stage
>>>>>> and use the MatLUFactor() and MatSolve() time from that  stage.
>>>>>> Also look at the load balancing column for the factorization and
>>>>>> solve
>>>>>> stage, it is well balanced?
>>>>>>
>>>>>>     Barry
>>>>>>
>>>>>> On Dec 20, 2012, at 2:16 PM, Thomas Witkowski
>>>>>> <thomas.witkowski at tu-dresden.**de <thomas.witkowski at tu-dresden.de>>
>>>>>>
>>>>>> wrote:
>>>>>>
>>>>>>  In my multilevel FETI-DP code, I have localized course matrices,
>>>>>> which
>>>>>>>
>>>>>>> are defined on only a subset of all MPI tasks, typically  between 4
>>>>>>> and 64
>>>>>>> tasks. The MatAIJ and the KSP objects are both  defined on a MPI
>>>>>>> communicator, which is a subset of  MPI::COMM_WORLD. The LU
>>>>>>> factorization of
>>>>>>> the matrices is computed  with either MUMPS or superlu_dist, but both
>>>>>>> show
>>>>>>> some scaling  property I really wonder of: When the overall problem
>>>>>>> size is
>>>>>>> increased, the solve with the LU factorization of the local  matrices
>>>>>>> does
>>>>>>> not scale! But why not? I just increase the number of  local
>>>>>>> matrices,
>>>>>>> but
>>>>>>> all of them are independent of each other. Some  example: I use 64
>>>>>>> cores,
>>>>>>> each coarse matrix is spanned by 4 cores  so there are 16 MPI
>>>>>>> communicators
>>>>>>> with 16 coarse space matrices.  The problem need to solve 192 times
>>>>>>> with the
>>>>>>> coarse space systems,  and this takes together 0.09 seconds. Now I
>>>>>>> increase
>>>>>>> the number of  cores to 256, but let the local coarse space be
>>>>>>> defined
>>>>>>> again
>>>>>>> on  only 4 cores. Again, 192 solutions with these coarse spaces are
>>>>>>> required, but now this takes 0.24 seconds. The same for 1024 cores,
>>>>>>>  and we
>>>>>>> are at 1.7 seconds for the local coarse space solver!
>>>>>>>
>>>>>>> For me, this is a total mystery! Any idea how to explain, debug and
>>>>>>> eventually how to resolve this problem?
>>>>>>>
>>>>>>> Thomas
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>> --
>>>> What most experimenters take for granted before they begin their
>>>> experiments is infinitely more interesting than any results to which
>>>> their experiments lead.
>>>> -- Norbert Wiener
>>>>
>>>
>>>
>>
>
>



--
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener

From jedbrown at mcs.anl.gov  Fri Dec 21 10:01:27 2012
From: jedbrown at mcs.anl.gov (Jed Brown)
Date: Fri, 21 Dec 2012 09:01:27 -0700
Subject: [petsc-users] LU factorization and solution of independent
 matrices does not scale, why?
In-Reply-To: <20121221165112.h5x9cere68sgc488@mail.zih.tu-dresden.de>
References: <50D37234.2040205@tu-dresden.de>
	<4F2AF113-B369-42AD-95B9-3D4C1E8F5CEE@mcs.anl.gov>
	<20121220213950.nyu4ddy1og0kkw8c@mail.zih.tu-dresden.de>
	<CAMYG4G=AGJQzdWwT8Q9C4WCm5X9v_e7dOv9G_1F_eUT6sFuLTA@mail.gmail.com>
	<50D42D82.10603@tu-dresden.de>
	<CAM9tzSk9r-rUSk_0f_vHzNBsTnD9TmR1=58N4XtA6HRdXRQvMg@mail.gmail.com>
	<20121221165112.h5x9cere68sgc488@mail.zih.tu-dresden.de>
Message-ID: <CAM9tzSmmcH9nB2RKutRRhQKtcd63bVQg3sYPtAGB8THyiFnTdw@mail.gmail.com>

Can you reproduce this in a simpler environment so that we can report it?
As I understand your statement, it sounds like you could reproduce by
changing src/ksp/ksp/examples/tutorials/ex10.c to create a subcomm of size
4 and the using that everywhere, then compare log_summary running on 4
cores to running on more (despite everything really being independent)

It would also be worth using an MPI profiler to see if it's really spending
a lot of time in MPI_Iprobe. Since SuperLU_DIST does not use MPI_Iprobe, it
may be something else.

On Fri, Dec 21, 2012 at 8:51 AM, Thomas Witkowski <
Thomas.Witkowski at tu-dresden.de> wrote:

> I use a modified MPICH version. On the system I use for these benchmarks I
> cannot use another MPI library.
>
> I'm not fixed to MUMPS. Superlu_dist, for example, works also perfectly
> for this. But there is still the following problem I cannot solve: When I
> increase the number of coarse space matrices, there seems to be no scaling
> direct solver for this. Just to summaries:
> - one coarse space matrix is created always by one "cluster" consisting of
> four subdomanins/MPI tasks
> - the four tasks are always local to one node, thus inter-node network
> communication is not required for computing factorization and solve
> - independent of the number of cluster, the coarse space matrices are the
> same, have the same number of rows, nnz structure but possibly different
> values
> - there is NO load unbalancing
> - the matrices must be factorized and there are a lot of solves (> 100)
> with them
>
> It should be pretty clear, that computing LU factorization and solving
> with it should scale perfectly. But at the moment, all direct solver I
> tried (mumps, superlu_dist, pastix) are not able to scale. The loos of
> scale is really worse, as you can see from the numbers I send before.
>
> Any ideas? Suggestions? Without a scaling solver method for these kind of
> systems, my multilevel FETI-DP code is just more or less a joke, only some
> orders of magnitude slower than standard FETI-DP method :)
>
> Thomas
>
> Zitat von Jed Brown <jedbrown at mcs.anl.gov>:
>
>  MUMPS uses MPI_Iprobe on MPI_COMM_WORLD (hard-coded). What MPI
>> implementation have you been using? Is the behavior different with a
>> different implementation?
>>
>>
>> On Fri, Dec 21, 2012 at 2:36 AM, Thomas Witkowski <
>> thomas.witkowski at tu-dresden.de**> wrote:
>>
>>  Okay, I did a similar benchmark now with PETSc's event logging:
>>>
>>> UMFPACK
>>>  16p: Local solve          350 1.0 2.3025e+01 1.1 5.00e+04 1.0 0.0e+00
>>> 0.0e+00 7.0e+02 63  0  0  0 52  63  0  0  0 51     0
>>>  64p: Local solve          350 1.0 2.3208e+01 1.1 5.00e+04 1.0 0.0e+00
>>> 0.0e+00 7.0e+02 60  0  0  0 52  60  0  0  0 51     0
>>> 256p: Local solve          350 1.0 2.3373e+01 1.1 5.00e+04 1.0 0.0e+00
>>> 0.0e+00 7.0e+02 49  0  0  0 52  49  0  0  0 51     1
>>>
>>> MUMPS
>>>  16p: Local solve          350 1.0 4.7183e+01 1.1 5.00e+04 1.0 0.0e+00
>>> 0.0e+00 7.0e+02 75  0  0  0 52  75  0  0  0 51     0
>>>  64p: Local solve          350 1.0 7.1409e+01 1.1 5.00e+04 1.0 0.0e+00
>>> 0.0e+00 7.0e+02 78  0  0  0 52  78  0  0  0 51     0
>>> 256p: Local solve          350 1.0 2.6079e+02 1.1 5.00e+04 1.0 0.0e+00
>>> 0.0e+00 7.0e+02 82  0  0  0 52  82  0  0  0 51     0
>>>
>>>
>>> As you see, the local solves with UMFPACK have nearly constant time with
>>> increasing number of subdomains. This is what I expect. The I replace
>>> UMFPACK by MUMPS and I see increasing time for local solves. In the last
>>> columns, UMFPACK has a decreasing value from 63 to 49, while MUMPS's
>>> column
>>> increases here from 75 to 82. What does this mean?
>>>
>>> Thomas
>>>
>>> Am 21.12.2012 02:19, schrieb Matthew Knepley:
>>>
>>>  On Thu, Dec 20, 2012 at 3:39 PM, Thomas Witkowski
>>>
>>>> <Thomas.Witkowski at tu-dresden.****de <Thomas.Witkowski at tu-dresden.**de<Thomas.Witkowski at tu-dresden.de>
>>>> >>
>>>>
>>>> wrote:
>>>>
>>>>  I cannot use the information from log_summary, as I have three
>>>>> different
>>>>> LU
>>>>> factorizations and solve (local matrices and two hierarchies of coarse
>>>>> grids). Therefore, I use the following work around to get the timing of
>>>>> the
>>>>> solve I'm intrested in:
>>>>>
>>>>>  You misunderstand how to use logging. You just put these thing in
>>>> separate stages. Stages represent
>>>> parts of the code over which events are aggregated.
>>>>
>>>>     Matt
>>>>
>>>>       MPI::COMM_WORLD.Barrier();
>>>>
>>>>>      wtime = MPI::Wtime();
>>>>>      KSPSolve(*(data->ksp_schur_****primal_local), tmp_primal,
>>>>>
>>>>> tmp_primal);
>>>>>      FetiTimings::fetiSolve03 += (MPI::Wtime() - wtime);
>>>>>
>>>>> The factorization is done explicitly before with "KSPSetUp", so I can
>>>>> measure the time for LU factorization. It also does not scale! For 64
>>>>> cores,
>>>>> I takes 0.05 seconds, for 1024 cores 1.2 seconds. In all calculations,
>>>>> the
>>>>> local coarse space matrices defined on four cores have exactly the same
>>>>> number of rows and exactly the same number of non zero entries. So,
>>>>> from
>>>>> my
>>>>> point of view, the time should be absolutely constant.
>>>>>
>>>>> Thomas
>>>>>
>>>>> Zitat von Barry Smith <bsmith at mcs.anl.gov>:
>>>>>
>>>>>
>>>>>     Are you timing ONLY the time to factor and solve the subproblems?
>>>>>  Or
>>>>>
>>>>>> also the time to get the data to the collection of 4 cores at a  time?
>>>>>>
>>>>>>     If you are only using LU for these problems and not elsewhere in
>>>>>>  the
>>>>>> code you can get the factorization and time from MatLUFactor()  and
>>>>>> MatSolve() or you can use stages to put this calculation in its  own
>>>>>> stage
>>>>>> and use the MatLUFactor() and MatSolve() time from that  stage.
>>>>>> Also look at the load balancing column for the factorization and
>>>>>>  solve
>>>>>> stage, it is well balanced?
>>>>>>
>>>>>>     Barry
>>>>>>
>>>>>> On Dec 20, 2012, at 2:16 PM, Thomas Witkowski
>>>>>> <thomas.witkowski at tu-dresden.****de <thomas.witkowski at tu-dresden.**de<thomas.witkowski at tu-dresden.de>
>>>>>> >>
>>>>>>
>>>>>> wrote:
>>>>>>
>>>>>>  In my multilevel FETI-DP code, I have localized course matrices,
>>>>>>  which
>>>>>>
>>>>>>> are defined on only a subset of all MPI tasks, typically  between 4
>>>>>>> and 64
>>>>>>> tasks. The MatAIJ and the KSP objects are both  defined on a MPI
>>>>>>> communicator, which is a subset of  MPI::COMM_WORLD. The LU
>>>>>>> factorization of
>>>>>>> the matrices is computed  with either MUMPS or superlu_dist, but both
>>>>>>> show
>>>>>>> some scaling  property I really wonder of: When the overall problem
>>>>>>> size is
>>>>>>> increased, the solve with the LU factorization of the local  matrices
>>>>>>> does
>>>>>>> not scale! But why not? I just increase the number of  local
>>>>>>> matrices,
>>>>>>> but
>>>>>>> all of them are independent of each other. Some  example: I use 64
>>>>>>> cores,
>>>>>>> each coarse matrix is spanned by 4 cores  so there are 16 MPI
>>>>>>> communicators
>>>>>>> with 16 coarse space matrices.  The problem need to solve 192 times
>>>>>>> with the
>>>>>>> coarse space systems,  and this takes together 0.09 seconds. Now I
>>>>>>> increase
>>>>>>> the number of  cores to 256, but let the local coarse space be
>>>>>>> defined
>>>>>>> again
>>>>>>> on  only 4 cores. Again, 192 solutions with these coarse spaces are
>>>>>>> required, but now this takes 0.24 seconds. The same for 1024 cores,
>>>>>>>  and we
>>>>>>> are at 1.7 seconds for the local coarse space solver!
>>>>>>>
>>>>>>> For me, this is a total mystery! Any idea how to explain, debug and
>>>>>>> eventually how to resolve this problem?
>>>>>>>
>>>>>>> Thomas
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>> --
>>>> What most experimenters take for granted before they begin their
>>>> experiments is infinitely more interesting than any results to which
>>>> their experiments lead.
>>>> -- Norbert Wiener
>>>>
>>>>
>>>
>>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121221/312cd24d/attachment-0001.html>

From jedbrown at mcs.anl.gov  Fri Dec 21 10:04:09 2012
From: jedbrown at mcs.anl.gov (Jed Brown)
Date: Fri, 21 Dec 2012 09:04:09 -0700
Subject: [petsc-users] LU factorization and solution of independent
 matrices does not scale, why?
In-Reply-To: <50D4878A.9080004@gfz-potsdam.de>
References: <50D37234.2040205@tu-dresden.de>
	<4F2AF113-B369-42AD-95B9-3D4C1E8F5CEE@mcs.anl.gov>
	<20121220213950.nyu4ddy1og0kkw8c@mail.zih.tu-dresden.de>
	<CAMYG4G=AGJQzdWwT8Q9C4WCm5X9v_e7dOv9G_1F_eUT6sFuLTA@mail.gmail.com>
	<50D42D82.10603@tu-dresden.de>
	<CAM9tzSk9r-rUSk_0f_vHzNBsTnD9TmR1=58N4XtA6HRdXRQvMg@mail.gmail.com>
	<20121221165112.h5x9cere68sgc488@mail.zih.tu-dresden.de>
	<50D4878A.9080004@gfz-potsdam.de>
Message-ID: <CAM9tzSkbZn9zJQOYX9PXwmoZk4X+2gP9b-0xjTKfzvRGnQR4+g@mail.gmail.com>

On Fri, Dec 21, 2012 at 9:00 AM, Alexander Grayver
<agrayver at gfz-potsdam.de>wrote:

> > MUMPS uses MPI_Iprobe on MPI_COMM_WORLD (hard-coded).
>
> Any reason they do it that way? Which part of the code is that (i.e.
> analysis/factorization/**solution.)?
>

They should Iprobe on the proper communicator. I don't know if it's a
mistake in MUMPS or if they were working around a historical bug in some
MPI implementation. At this point, we don't have sufficiently detailed
profiling to determine that this has anything to do with the strange
performance degradation that Thomas is seeing.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121221/92dbba70/attachment.html>

From Thomas.Witkowski at tu-dresden.de  Fri Dec 21 15:05:21 2012
From: Thomas.Witkowski at tu-dresden.de (Thomas Witkowski)
Date: Fri, 21 Dec 2012 22:05:21 +0100
Subject: [petsc-users] LU factorization and solution of
	independent	matrices does not scale, why?
In-Reply-To: <CAM9tzSmmcH9nB2RKutRRhQKtcd63bVQg3sYPtAGB8THyiFnTdw@mail.gmail.com>
References: <50D37234.2040205@tu-dresden.de>
	<4F2AF113-B369-42AD-95B9-3D4C1E8F5CEE@mcs.anl.gov>
	<20121220213950.nyu4ddy1og0kkw8c@mail.zih.tu-dresden.de>
	<CAMYG4G=AGJQzdWwT8Q9C4WCm5X9v_e7dOv9G_1F_eUT6sFuLTA@mail.gmail.com>
	<50D42D82.10603@tu-dresden.de>
	<CAM9tzSk9r-rUSk_0f_vHzNBsTnD9TmR1=58N4XtA6HRdXRQvMg@mail.gmail.com>
	<20121221165112.h5x9cere68sgc488@mail.zih.tu-dresden.de>
	<CAM9tzSmmcH9nB2RKutRRhQKtcd63bVQg3sYPtAGB8THyiFnTdw@mail.gmail.com>
Message-ID: <20121221220521.qbp4io8kws040o8g@mail.zih.tu-dresden.de>

So, here it is. Just compile and run with

mpiexec -np 64 ./ex10 -ksp_type preonly -pc_type lu  
-pc_factor_mat_solver_package superlu_dist -log_summary

64 cores: 0.09 seconds for solving
1024 cores: 2.6 seconds for solving

Thomas


Zitat von Jed Brown <jedbrown at mcs.anl.gov>:

> Can you reproduce this in a simpler environment so that we can report it?
> As I understand your statement, it sounds like you could reproduce by
> changing src/ksp/ksp/examples/tutorials/ex10.c to create a subcomm of size
> 4 and the using that everywhere, then compare log_summary running on 4
> cores to running on more (despite everything really being independent)
>
> It would also be worth using an MPI profiler to see if it's really spending
> a lot of time in MPI_Iprobe. Since SuperLU_DIST does not use MPI_Iprobe, it
> may be something else.
>
> On Fri, Dec 21, 2012 at 8:51 AM, Thomas Witkowski <
> Thomas.Witkowski at tu-dresden.de> wrote:
>
>> I use a modified MPICH version. On the system I use for these benchmarks I
>> cannot use another MPI library.
>>
>> I'm not fixed to MUMPS. Superlu_dist, for example, works also perfectly
>> for this. But there is still the following problem I cannot solve: When I
>> increase the number of coarse space matrices, there seems to be no scaling
>> direct solver for this. Just to summaries:
>> - one coarse space matrix is created always by one "cluster" consisting of
>> four subdomanins/MPI tasks
>> - the four tasks are always local to one node, thus inter-node network
>> communication is not required for computing factorization and solve
>> - independent of the number of cluster, the coarse space matrices are the
>> same, have the same number of rows, nnz structure but possibly different
>> values
>> - there is NO load unbalancing
>> - the matrices must be factorized and there are a lot of solves (> 100)
>> with them
>>
>> It should be pretty clear, that computing LU factorization and solving
>> with it should scale perfectly. But at the moment, all direct solver I
>> tried (mumps, superlu_dist, pastix) are not able to scale. The loos of
>> scale is really worse, as you can see from the numbers I send before.
>>
>> Any ideas? Suggestions? Without a scaling solver method for these kind of
>> systems, my multilevel FETI-DP code is just more or less a joke, only some
>> orders of magnitude slower than standard FETI-DP method :)
>>
>> Thomas
>>
>> Zitat von Jed Brown <jedbrown at mcs.anl.gov>:
>>
>>  MUMPS uses MPI_Iprobe on MPI_COMM_WORLD (hard-coded). What MPI
>>> implementation have you been using? Is the behavior different with a
>>> different implementation?
>>>
>>>
>>> On Fri, Dec 21, 2012 at 2:36 AM, Thomas Witkowski <
>>> thomas.witkowski at tu-dresden.de**> wrote:
>>>
>>>  Okay, I did a similar benchmark now with PETSc's event logging:
>>>>
>>>> UMFPACK
>>>>  16p: Local solve          350 1.0 2.3025e+01 1.1 5.00e+04 1.0 0.0e+00
>>>> 0.0e+00 7.0e+02 63  0  0  0 52  63  0  0  0 51     0
>>>>  64p: Local solve          350 1.0 2.3208e+01 1.1 5.00e+04 1.0 0.0e+00
>>>> 0.0e+00 7.0e+02 60  0  0  0 52  60  0  0  0 51     0
>>>> 256p: Local solve          350 1.0 2.3373e+01 1.1 5.00e+04 1.0 0.0e+00
>>>> 0.0e+00 7.0e+02 49  0  0  0 52  49  0  0  0 51     1
>>>>
>>>> MUMPS
>>>>  16p: Local solve          350 1.0 4.7183e+01 1.1 5.00e+04 1.0 0.0e+00
>>>> 0.0e+00 7.0e+02 75  0  0  0 52  75  0  0  0 51     0
>>>>  64p: Local solve          350 1.0 7.1409e+01 1.1 5.00e+04 1.0 0.0e+00
>>>> 0.0e+00 7.0e+02 78  0  0  0 52  78  0  0  0 51     0
>>>> 256p: Local solve          350 1.0 2.6079e+02 1.1 5.00e+04 1.0 0.0e+00
>>>> 0.0e+00 7.0e+02 82  0  0  0 52  82  0  0  0 51     0
>>>>
>>>>
>>>> As you see, the local solves with UMFPACK have nearly constant time with
>>>> increasing number of subdomains. This is what I expect. The I replace
>>>> UMFPACK by MUMPS and I see increasing time for local solves. In the last
>>>> columns, UMFPACK has a decreasing value from 63 to 49, while MUMPS's
>>>> column
>>>> increases here from 75 to 82. What does this mean?
>>>>
>>>> Thomas
>>>>
>>>> Am 21.12.2012 02:19, schrieb Matthew Knepley:
>>>>
>>>>  On Thu, Dec 20, 2012 at 3:39 PM, Thomas Witkowski
>>>>
>>>>> <Thomas.Witkowski at tu-dresden.****de   
>>>>> <Thomas.Witkowski at tu-dresden.**de<Thomas.Witkowski at tu-dresden.de>
>>>>> >>
>>>>>
>>>>> wrote:
>>>>>
>>>>>  I cannot use the information from log_summary, as I have three
>>>>>> different
>>>>>> LU
>>>>>> factorizations and solve (local matrices and two hierarchies of coarse
>>>>>> grids). Therefore, I use the following work around to get the timing of
>>>>>> the
>>>>>> solve I'm intrested in:
>>>>>>
>>>>>>  You misunderstand how to use logging. You just put these thing in
>>>>> separate stages. Stages represent
>>>>> parts of the code over which events are aggregated.
>>>>>
>>>>>     Matt
>>>>>
>>>>>       MPI::COMM_WORLD.Barrier();
>>>>>
>>>>>>      wtime = MPI::Wtime();
>>>>>>      KSPSolve(*(data->ksp_schur_****primal_local), tmp_primal,
>>>>>>
>>>>>> tmp_primal);
>>>>>>      FetiTimings::fetiSolve03 += (MPI::Wtime() - wtime);
>>>>>>
>>>>>> The factorization is done explicitly before with "KSPSetUp", so I can
>>>>>> measure the time for LU factorization. It also does not scale! For 64
>>>>>> cores,
>>>>>> I takes 0.05 seconds, for 1024 cores 1.2 seconds. In all calculations,
>>>>>> the
>>>>>> local coarse space matrices defined on four cores have exactly the same
>>>>>> number of rows and exactly the same number of non zero entries. So,
>>>>>> from
>>>>>> my
>>>>>> point of view, the time should be absolutely constant.
>>>>>>
>>>>>> Thomas
>>>>>>
>>>>>> Zitat von Barry Smith <bsmith at mcs.anl.gov>:
>>>>>>
>>>>>>
>>>>>>     Are you timing ONLY the time to factor and solve the subproblems?
>>>>>>  Or
>>>>>>
>>>>>>> also the time to get the data to the collection of 4 cores at a  time?
>>>>>>>
>>>>>>>     If you are only using LU for these problems and not elsewhere in
>>>>>>>  the
>>>>>>> code you can get the factorization and time from MatLUFactor()  and
>>>>>>> MatSolve() or you can use stages to put this calculation in its  own
>>>>>>> stage
>>>>>>> and use the MatLUFactor() and MatSolve() time from that  stage.
>>>>>>> Also look at the load balancing column for the factorization and
>>>>>>>  solve
>>>>>>> stage, it is well balanced?
>>>>>>>
>>>>>>>     Barry
>>>>>>>
>>>>>>> On Dec 20, 2012, at 2:16 PM, Thomas Witkowski
>>>>>>> <thomas.witkowski at tu-dresden.****de   
>>>>>>> <thomas.witkowski at tu-dresden.**de<thomas.witkowski at tu-dresden.de>
>>>>>>> >>
>>>>>>>
>>>>>>> wrote:
>>>>>>>
>>>>>>>  In my multilevel FETI-DP code, I have localized course matrices,
>>>>>>>  which
>>>>>>>
>>>>>>>> are defined on only a subset of all MPI tasks, typically  between 4
>>>>>>>> and 64
>>>>>>>> tasks. The MatAIJ and the KSP objects are both  defined on a MPI
>>>>>>>> communicator, which is a subset of  MPI::COMM_WORLD. The LU
>>>>>>>> factorization of
>>>>>>>> the matrices is computed  with either MUMPS or superlu_dist, but both
>>>>>>>> show
>>>>>>>> some scaling  property I really wonder of: When the overall problem
>>>>>>>> size is
>>>>>>>> increased, the solve with the LU factorization of the local  matrices
>>>>>>>> does
>>>>>>>> not scale! But why not? I just increase the number of  local
>>>>>>>> matrices,
>>>>>>>> but
>>>>>>>> all of them are independent of each other. Some  example: I use 64
>>>>>>>> cores,
>>>>>>>> each coarse matrix is spanned by 4 cores  so there are 16 MPI
>>>>>>>> communicators
>>>>>>>> with 16 coarse space matrices.  The problem need to solve 192 times
>>>>>>>> with the
>>>>>>>> coarse space systems,  and this takes together 0.09 seconds. Now I
>>>>>>>> increase
>>>>>>>> the number of  cores to 256, but let the local coarse space be
>>>>>>>> defined
>>>>>>>> again
>>>>>>>> on  only 4 cores. Again, 192 solutions with these coarse spaces are
>>>>>>>> required, but now this takes 0.24 seconds. The same for 1024 cores,
>>>>>>>>  and we
>>>>>>>> are at 1.7 seconds for the local coarse space solver!
>>>>>>>>
>>>>>>>> For me, this is a total mystery! Any idea how to explain, debug and
>>>>>>>> eventually how to resolve this problem?
>>>>>>>>
>>>>>>>> Thomas
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>> --
>>>>> What most experimenters take for granted before they begin their
>>>>> experiments is infinitely more interesting than any results to which
>>>>> their experiments lead.
>>>>> -- Norbert Wiener
>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
>


-------------- next part --------------
A non-text attachment was scrubbed...
Name: ex10.c
Type: text/x-c++src
Size: 3496 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121221/77093f71/attachment.c>
-------------- next part --------------
 {P         ?                                                                         	   
      
                            	   
      
                            	   
      
                            	                                     	                                     	                                        	   
         
                         	   
         
                        	   
         
                                  	   
         
                         	   
         
                        	   
         
                                  	   
         
                         	   
         
                        	   
         
   ??????6??@a???o??????/a>?|7M? ??
7??.?????
??P??[??'?#>????B?>?"
??B??E(@I???????????\?l????Y|?
 ???2?P6;f??@a???f??f
???>???R?Nd?
YLr???????????&??>?H??U?>??/.?&>?w?~(l???K?@D????U????Y|?
 M??????>?*^?7??#????>?
3??]>?T?:???>?u???h??? H?????4gR???????]#?>?	????>>?
3???
YL$???
??X?? ?.??????????z??????3?>??>???
#X>?? ?^???m??
?????F.??????n??#????>???R?N$?
7??.?z??;???E??)?j?-?????????<?V	V????c??j>??`[e{>??g????>?~???2????????>?*^?6???f
??
>?|7M? ???MpmU??;???k?? ?.??>?W??c^?>?9?Zl????????????
??S?(???>????z???????]%
>?w?~(ln??????????S?(?? >?~???2K????F.?????D~L????7?@???????1???=A$??	>?^`??w?>?d?t?E????h?M?l???}!}????D?
??>?	????4??K?@D???\?l??????w???????e?>??.??0?>?Y@
#T??>???$<??F?5zU????}!}????_o0????'=Yr>????z??????????????%???]????>??-?n?Z??????f>?a/??y???F??a6
??>?iVj????D?
?????'=Y&??????? H??
>?H??V?>?"
?????????>??`[e|6>?? ?]???^
G?????e????s??d"???2???>??u?rC>??x?????=A$???>?Y@
"?>?a/??zA??4gR?*>??/.?/??E(@I?b??g_?]`???_=2?0k??	? x??>??u?q#??2s?8?L>??
x8S?>?^`??xE??>???$:??F??a6B?????=>??g???????m??
???c??Dt??	?=?a??_>??(>??x??z>??
x8S???2t_??(>?d?t?F???F?5zV??>?iVk>?T?:????????&??[??'??>?W??c^????<?V	????3?>??@????X?>??C??	?>??06?????^
G?????g_?]a
??c??DtE????D~&????w?????]???>?u???l?????&??>????B?u>??C??	????????x>??L?=??e????l??_=2?/J??	?=?`?????7?;?????e?>??-?n??>?9?ZZ????c??%>???
#1>??06???>??L?F????
?#.??d"*??	? x????_>??&???????,(>??.??1[??????0

From gokhalen at gmail.com  Fri Dec 21 15:16:29 2012
From: gokhalen at gmail.com (Nachiket Gokhale)
Date: Fri, 21 Dec 2012 16:16:29 -0500
Subject: [petsc-users] getting a sub matrix from a matrix
Message-ID: <CAGBgCJG-=y_tqmYBGs2kSMgNgGKmtViV-NxF6KN-vQnoOsTAkA@mail.gmail.com>

I have a dense matrix A  (100x100)   and I want to extract a matrix B from
it consisting of the first  N columns of A.   Is there a better way to do
it than getting the column using MatGetColumnVector, followed by
VecGetArray,  and MatSetValues?  It could also be done using
MatGetSubMatrix but is seems to be more involved.

Thanks,

 -Nachiket
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121221/9a2a607d/attachment.html>

From bsmith at mcs.anl.gov  Fri Dec 21 15:34:25 2012
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Fri, 21 Dec 2012 15:34:25 -0600
Subject: [petsc-users] getting a sub matrix from a matrix
In-Reply-To: <CAGBgCJG-=y_tqmYBGs2kSMgNgGKmtViV-NxF6KN-vQnoOsTAkA@mail.gmail.com>
References: <CAGBgCJG-=y_tqmYBGs2kSMgNgGKmtViV-NxF6KN-vQnoOsTAkA@mail.gmail.com>
Message-ID: <609E3A9E-766B-4A02-AF6C-0E4D39CBF0FC@mcs.anl.gov>


On Dec 21, 2012, at 3:16 PM, Nachiket Gokhale <gokhalen at gmail.com> wrote:

> I have a dense matrix A  (100x100)   and I want to extract a matrix B from it consisting of the first  N columns of A.   Is there a better way to do it than getting the column using MatGetColumnVector, followed by VecGetArray,  and MatSetValues?  It could also be done using MatGetSubMatrix but is seems to be more involved. 

   MatGetSubMatrix() is exactly for this purpose and should not be particularly involved. Use ISCreateStride() to create an IS to indicate all the rows and and another ISCreateStride to indicate the 0 to N-1 columns. 

   Barry

> 
> Thanks, 
>  
>  -Nachiket


From s_g at berkeley.edu  Sun Dec 23 18:48:36 2012
From: s_g at berkeley.edu (Sanjay Govindjee)
Date: Sun, 23 Dec 2012 16:48:36 -0800
Subject: [petsc-users] Using superlu_dist in a direct solve
Message-ID: <50D7A664.6080802@berkeley.edu>

I wanted to use SuperLU Dist to perform a direct solve but seem to be 
encountering
a problem.  I was wonder if this is a know issue and if there is a 
solution for it.

The problem is easily observed using ex6.c in src/ksp/ksp/examples/tests.

Out of the box: make runex6 produces a residual error of O(1e-11), all 
is well.

I then changed the run to run on two processors and add the flag
-pc_factor_mat_solver_package spooles  this produces a residual error of 
O(1e-11), all is still well.

I then switch over to -pc_factor_mat_solver_package superlu_dist and the
residual error comes back as 22.6637!  Something seems very wrong.

My build is perfectly vanilla:

export PETSC_DIR=/Users/sg/petsc-3.3-p5/
export PETSC_ARCH=intel

./configure --with-cc=icc --with-fc=ifort  \ 
-download-{spooles,parmetis,superlu_dist,prometheus,mpich,ml,hypre,metis}

make PETSC_DIR=/Users/sg/petsc-3.3-p5/ PETSC_ARCH=intel all
make PETSC_DIR=/Users/sg/petsc-3.3-p5/ PETSC_ARCH=intel test

-sanjay
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121223/bafbafc7/attachment.html>

From jedbrown at mcs.anl.gov  Sun Dec 23 18:56:36 2012
From: jedbrown at mcs.anl.gov (Jed Brown)
Date: Sun, 23 Dec 2012 18:56:36 -0600
Subject: [petsc-users] Using superlu_dist in a direct solve
In-Reply-To: <50D7A664.6080802@berkeley.edu>
References: <50D7A664.6080802@berkeley.edu>
Message-ID: <CAM9tzSnY9o6EckTmo_GkD6QxHjj3o54ragfJotbacPkr8fKfmw@mail.gmail.com>

Where is your matrix? It might be ending up with a very bad pivot. If the
problem can be reproduced, it should be reported to the SuperLU_DIST
developers to fix. (Note that we do not see this with other matrices.) You
can also try MUMPS.


On Sun, Dec 23, 2012 at 6:48 PM, Sanjay Govindjee <s_g at berkeley.edu> wrote:

>  I wanted to use SuperLU Dist to perform a direct solve but seem to be
> encountering
> a problem.  I was wonder if this is a know issue and if there is a
> solution for it.
>
> The problem is easily observed using ex6.c in src/ksp/ksp/examples/tests.
>
> Out of the box: make runex6 produces a residual error of O(1e-11), all is
> well.
>
> I then changed the run to run on two processors and add the flag
> -pc_factor_mat_solver_package spooles  this produces a residual error of
> O(1e-11), all is still well.
>
> I then switch over to -pc_factor_mat_solver_package superlu_dist and the
> residual error comes back as 22.6637!  Something seems very wrong.
>
> My build is perfectly vanilla:
>
> export PETSC_DIR=/Users/sg/petsc-3.3-p5/
> export PETSC_ARCH=intel
>
> ./configure --with-cc=icc --with-fc=ifort  \
> -download-{spooles,parmetis,superlu_dist,prometheus,mpich,ml,hypre,metis}
>
> make PETSC_DIR=/Users/sg/petsc-3.3-p5/ PETSC_ARCH=intel all
> make PETSC_DIR=/Users/sg/petsc-3.3-p5/ PETSC_ARCH=intel test
>
> -sanjay
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121223/c4ac90de/attachment.html>

From s_g at berkeley.edu  Sun Dec 23 19:08:37 2012
From: s_g at berkeley.edu (Sanjay Govindjee)
Date: Sun, 23 Dec 2012 17:08:37 -0800
Subject: [petsc-users] Using superlu_dist in a direct solve
In-Reply-To: <CAM9tzSnY9o6EckTmo_GkD6QxHjj3o54ragfJotbacPkr8fKfmw@mail.gmail.com>
References: <50D7A664.6080802@berkeley.edu>
	<CAM9tzSnY9o6EckTmo_GkD6QxHjj3o54ragfJotbacPkr8fKfmw@mail.gmail.com>
Message-ID: <50D7AB15.5040606@berkeley.edu>

Not sure what you mean by where is your matrix?  I am simply running ex6 
in the ksp/examples/tests directory.

The reason I ran this test is because I was seeing the same behavior 
with my finite element code (on perfectly benign problems).

Is there a built-in test that you use to check that superlu_dist is 
working properly with petsc?
i.e. something you know that works with with petsc 3.3-p5?

-sanjay


On 12/23/12 4:56 PM, Jed Brown wrote:
> Where is your matrix? It might be ending up with a very bad pivot. If 
> the problem can be reproduced, it should be reported to the 
> SuperLU_DIST developers to fix. (Note that we do not see this with 
> other matrices.) You can also try MUMPS.
>
>
> On Sun, Dec 23, 2012 at 6:48 PM, Sanjay Govindjee <s_g at berkeley.edu 
> <mailto:s_g at berkeley.edu>> wrote:
>
>     I wanted to use SuperLU Dist to perform a direct solve but seem to
>     be encountering
>     a problem.  I was wonder if this is a know issue and if there is a
>     solution for it.
>
>     The problem is easily observed using ex6.c in
>     src/ksp/ksp/examples/tests.
>
>     Out of the box: make runex6 produces a residual error of O(1e-11),
>     all is well.
>
>     I then changed the run to run on two processors and add the flag
>     -pc_factor_mat_solver_package spooles  this produces a residual
>     error of O(1e-11), all is still well.
>
>     I then switch over to -pc_factor_mat_solver_package superlu_dist
>     and the
>     residual error comes back as 22.6637!  Something seems very wrong.
>
>     My build is perfectly vanilla:
>
>     export PETSC_DIR=/Users/sg/petsc-3.3-p5/
>     export PETSC_ARCH=intel
>
>     ./configure --with-cc=icc --with-fc=ifort  \
>     -download-{spooles,parmetis,superlu_dist,prometheus,mpich,ml,hypre,metis}
>
>     make PETSC_DIR=/Users/sg/petsc-3.3-p5/ PETSC_ARCH=intel all
>     make PETSC_DIR=/Users/sg/petsc-3.3-p5/ PETSC_ARCH=intel test
>
>     -sanjay
>
>

-- 
-----------------------------------------------
Sanjay Govindjee, PhD, PE
Professor of Civil Engineering
Vice Chair for Academic Affairs

779 Davis Hall
Structural Engineering, Mechanics and Materials
Department of Civil Engineering
University of California
Berkeley, CA 94720-1710

Voice:  +1 510 642 6060
FAX:    +1 510 643 5264
s_g at berkeley.edu
http://www.ce.berkeley.edu/~sanjay
-----------------------------------------------

New Books:

Engineering Mechanics of Deformable
Solids: A Presentation with Exercises
http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641
http://ukcatalogue.oup.com/product/9780199651641.do
http://amzn.com/0199651647


Engineering Mechanics 3 (Dynamics)
http://www.springer.com/materials/mechanics/book/978-3-642-14018-1
http://amzn.com/3642140181

-----------------------------------------------

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121223/4acfc64f/attachment.html>

From knepley at gmail.com  Sun Dec 23 19:58:37 2012
From: knepley at gmail.com (Matthew Knepley)
Date: Sun, 23 Dec 2012 20:58:37 -0500
Subject: [petsc-users] Using superlu_dist in a direct solve
In-Reply-To: <50D7AB15.5040606@berkeley.edu>
References: <50D7A664.6080802@berkeley.edu>
	<CAM9tzSnY9o6EckTmo_GkD6QxHjj3o54ragfJotbacPkr8fKfmw@mail.gmail.com>
	<50D7AB15.5040606@berkeley.edu>
Message-ID: <CAMYG4GmuhF7Jh7os3j9bLubuoMGnerxtN+tU=gzpeYmX+3UUXA@mail.gmail.com>

On Sun, Dec 23, 2012 at 8:08 PM, Sanjay Govindjee <s_g at berkeley.edu> wrote:

>  Not sure what you mean by where is your matrix?  I am simply running ex6
> in the ksp/examples/tests directory.
>
> The reason I ran this test is because I was seeing the same behavior with
> my finite element code (on perfectly benign problems).
>
> Is there a built-in test that you use to check that superlu_dist is
> working properly with petsc?
> i.e. something you know that works with with petsc 3.3-p5?
>

1) Run it on a SNES ex5 (or KSP ex2), which is a nice Laplacian

2) Compare with MUMPS

   Matt


> -sanjay
>
>
>
> On 12/23/12 4:56 PM, Jed Brown wrote:
>
> Where is your matrix? It might be ending up with a very bad pivot. If the
> problem can be reproduced, it should be reported to the SuperLU_DIST
> developers to fix. (Note that we do not see this with other matrices.) You
> can also try MUMPS.
>
>
> On Sun, Dec 23, 2012 at 6:48 PM, Sanjay Govindjee <s_g at berkeley.edu>wrote:
>
>>  I wanted to use SuperLU Dist to perform a direct solve but seem to be
>> encountering
>> a problem.  I was wonder if this is a know issue and if there is a
>> solution for it.
>>
>> The problem is easily observed using ex6.c in src/ksp/ksp/examples/tests.
>>
>> Out of the box: make runex6 produces a residual error of O(1e-11), all is
>> well.
>>
>> I then changed the run to run on two processors and add the flag
>> -pc_factor_mat_solver_package spooles  this produces a residual error of
>> O(1e-11), all is still well.
>>
>> I then switch over to -pc_factor_mat_solver_package superlu_dist and the
>> residual error comes back as 22.6637!  Something seems very wrong.
>>
>> My build is perfectly vanilla:
>>
>> export PETSC_DIR=/Users/sg/petsc-3.3-p5/
>> export PETSC_ARCH=intel
>>
>> ./configure --with-cc=icc --with-fc=ifort  \
>> -download-{spooles,parmetis,superlu_dist,prometheus,mpich,ml,hypre,metis}
>>
>> make PETSC_DIR=/Users/sg/petsc-3.3-p5/ PETSC_ARCH=intel all
>> make PETSC_DIR=/Users/sg/petsc-3.3-p5/ PETSC_ARCH=intel test
>>
>> -sanjay
>>
>
>
> --
> -----------------------------------------------
> Sanjay Govindjee, PhD, PE
> Professor of Civil Engineering
> Vice Chair for Academic Affairs
>
> 779 Davis Hall
> Structural Engineering, Mechanics and Materials
> Department of Civil Engineering
> University of California
> Berkeley, CA 94720-1710
>
> Voice:  +1 510 642 6060
> FAX:    +1 510 643 5264s_g at berkeley.eduhttp://www.ce.berkeley.edu/~sanjay
> -----------------------------------------------
>
> New Books:
>
> Engineering Mechanics of Deformable
> Solids: A Presentation with Exerciseshttp://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641http://ukcatalogue.oup.com/product/9780199651641.dohttp://amzn.com/0199651647
>
>
> Engineering Mechanics 3 (Dynamics)http://www.springer.com/materials/mechanics/book/978-3-642-14018-1http://amzn.com/3642140181
>
> -----------------------------------------------
>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121223/ab7d829d/attachment.html>

From s_g at berkeley.edu  Sun Dec 23 20:01:35 2012
From: s_g at berkeley.edu (Sanjay Govindjee)
Date: Sun, 23 Dec 2012 18:01:35 -0800
Subject: [petsc-users] Using superlu_dist in a direct solve
In-Reply-To: <CAMYG4GmuhF7Jh7os3j9bLubuoMGnerxtN+tU=gzpeYmX+3UUXA@mail.gmail.com>
References: <50D7A664.6080802@berkeley.edu>
	<CAM9tzSnY9o6EckTmo_GkD6QxHjj3o54ragfJotbacPkr8fKfmw@mail.gmail.com>
	<50D7AB15.5040606@berkeley.edu>
	<CAMYG4GmuhF7Jh7os3j9bLubuoMGnerxtN+tU=gzpeYmX+3UUXA@mail.gmail.com>
Message-ID: <50D7B77F.5010306@berkeley.edu>

Would it be acceptable to use SPOOLES for the comparison?  or is MUMPS 
needed (I would like to avoid re-doing my installation).

On 12/23/12 5:58 PM, Matthew Knepley wrote:
> On Sun, Dec 23, 2012 at 8:08 PM, Sanjay Govindjee <s_g at berkeley.edu 
> <mailto:s_g at berkeley.edu>> wrote:
>
>     Not sure what you mean by where is your matrix?  I am simply
>     running ex6 in the ksp/examples/tests directory.
>
>     The reason I ran this test is because I was seeing the same
>     behavior with my finite element code (on perfectly benign problems).
>
>     Is there a built-in test that you use to check that superlu_dist
>     is working properly with petsc?
>     i.e. something you know that works with with petsc 3.3-p5?
>
>
> 1) Run it on a SNES ex5 (or KSP ex2), which is a nice Laplacian
>
> 2) Compare with MUMPS
>
>    Matt
>
>     -sanjay
>
>
>
>     On 12/23/12 4:56 PM, Jed Brown wrote:
>>     Where is your matrix? It might be ending up with a very bad
>>     pivot. If the problem can be reproduced, it should be reported to
>>     the SuperLU_DIST developers to fix. (Note that we do not see this
>>     with other matrices.) You can also try MUMPS.
>>
>>
>>     On Sun, Dec 23, 2012 at 6:48 PM, Sanjay Govindjee
>>     <s_g at berkeley.edu <mailto:s_g at berkeley.edu>> wrote:
>>
>>         I wanted to use SuperLU Dist to perform a direct solve but
>>         seem to be encountering
>>         a problem.  I was wonder if this is a know issue and if there
>>         is a solution for it.
>>
>>         The problem is easily observed using ex6.c in
>>         src/ksp/ksp/examples/tests.
>>
>>         Out of the box: make runex6 produces a residual error of
>>         O(1e-11), all is well.
>>
>>         I then changed the run to run on two processors and add the flag
>>         -pc_factor_mat_solver_package spooles this produces a
>>         residual error of O(1e-11), all is still well.
>>
>>         I then switch over to -pc_factor_mat_solver_package
>>         superlu_dist and the
>>         residual error comes back as 22.6637! Something seems very wrong.
>>
>>         My build is perfectly vanilla:
>>
>>         export PETSC_DIR=/Users/sg/petsc-3.3-p5/
>>         export PETSC_ARCH=intel
>>
>>         ./configure --with-cc=icc --with-fc=ifort \
>>         -download-{spooles,parmetis,superlu_dist,prometheus,mpich,ml,hypre,metis}
>>
>>         make PETSC_DIR=/Users/sg/petsc-3.3-p5/ PETSC_ARCH=intel all
>>         make PETSC_DIR=/Users/sg/petsc-3.3-p5/ PETSC_ARCH=intel test
>>
>>         -sanjay
>>
>>
>
>     -- 
>     -----------------------------------------------
>     Sanjay Govindjee, PhD, PE
>     Professor of Civil Engineering
>     Vice Chair for Academic Affairs
>
>     779 Davis Hall
>     Structural Engineering, Mechanics and Materials
>     Department of Civil Engineering
>     University of California
>     Berkeley, CA 94720-1710
>
>     Voice:+1 510 642 6060  <tel:%2B1%20510%20642%206060>
>     FAX:+1 510 643 5264  <tel:%2B1%20510%20643%205264>
>     s_g at berkeley.edu  <mailto:s_g at berkeley.edu>
>     http://www.ce.berkeley.edu/~sanjay  <http://www.ce.berkeley.edu/%7Esanjay>
>     -----------------------------------------------
>
>     New Books:
>
>     Engineering Mechanics of Deformable
>     Solids: A Presentation with Exercises
>     http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641
>     http://ukcatalogue.oup.com/product/9780199651641.do
>     http://amzn.com/0199651647
>
>
>     Engineering Mechanics 3 (Dynamics)
>     http://www.springer.com/materials/mechanics/book/978-3-642-14018-1
>     http://amzn.com/3642140181
>
>     -----------------------------------------------
>
>
>
>
> -- 
> What most experimenters take for granted before they begin their 
> experiments is infinitely more interesting than any results to which 
> their experiments lead.
> -- Norbert Wiener

-- 
-----------------------------------------------
Sanjay Govindjee, PhD, PE
Professor of Civil Engineering
Vice Chair for Academic Affairs

779 Davis Hall
Structural Engineering, Mechanics and Materials
Department of Civil Engineering
University of California
Berkeley, CA 94720-1710

Voice:  +1 510 642 6060
FAX:    +1 510 643 5264
s_g at berkeley.edu
http://www.ce.berkeley.edu/~sanjay
-----------------------------------------------

New Books:

Engineering Mechanics of Deformable
Solids: A Presentation with Exercises
http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641
http://ukcatalogue.oup.com/product/9780199651641.do
http://amzn.com/0199651647


Engineering Mechanics 3 (Dynamics)
http://www.springer.com/materials/mechanics/book/978-3-642-14018-1
http://amzn.com/3642140181

-----------------------------------------------

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121223/811e7977/attachment-0001.html>

From jedbrown at mcs.anl.gov  Sun Dec 23 20:07:23 2012
From: jedbrown at mcs.anl.gov (Jed Brown)
Date: Sun, 23 Dec 2012 20:07:23 -0600
Subject: [petsc-users] Using superlu_dist in a direct solve
In-Reply-To: <50D7AB15.5040606@berkeley.edu>
References: <50D7A664.6080802@berkeley.edu>
	<CAM9tzSnY9o6EckTmo_GkD6QxHjj3o54ragfJotbacPkr8fKfmw@mail.gmail.com>
	<50D7AB15.5040606@berkeley.edu>
Message-ID: <CAM9tzS=U92egvJgeNPKSLWAeznwUD72BXWf72VDBiXfCXZzMcQ@mail.gmail.com>

You didn't say what options you were running ex6 with, but with the options
used for the tests, I see

~/petsc/src/ksp/ksp/examples/tests$ mpirun.hydra -n 2 ./ex6 -f
~/petsc/datafiles/matrices/arco1 -pc_type lu -pc_factor_mat_solver_package
superlu_dist
Number of iterations =   1
Residual norm = 2.23439e-11


You need to give precise instructions for how to reproduce the behavior you
are seeing.

Also, for experimenting with matrices read from files, we prefer
src/ksp/ksp/examples/tutorials/ex10.c because it is better commented and
has more features.


On Sun, Dec 23, 2012 at 7:08 PM, Sanjay Govindjee <s_g at berkeley.edu> wrote:

>  Not sure what you mean by where is your matrix?  I am simply running ex6
> in the ksp/examples/tests directory.
>
> The reason I ran this test is because I was seeing the same behavior with
> my finite element code (on perfectly benign problems).
>
> Is there a built-in test that you use to check that superlu_dist is
> working properly with petsc?
> i.e. something you know that works with with petsc 3.3-p5?
>
> -sanjay
>
>
>
> On 12/23/12 4:56 PM, Jed Brown wrote:
>
> Where is your matrix? It might be ending up with a very bad pivot. If the
> problem can be reproduced, it should be reported to the SuperLU_DIST
> developers to fix. (Note that we do not see this with other matrices.) You
> can also try MUMPS.
>
>
> On Sun, Dec 23, 2012 at 6:48 PM, Sanjay Govindjee <s_g at berkeley.edu>wrote:
>
>>  I wanted to use SuperLU Dist to perform a direct solve but seem to be
>> encountering
>> a problem.  I was wonder if this is a know issue and if there is a
>> solution for it.
>>
>> The problem is easily observed using ex6.c in src/ksp/ksp/examples/tests.
>>
>> Out of the box: make runex6 produces a residual error of O(1e-11), all is
>> well.
>>
>> I then changed the run to run on two processors and add the flag
>> -pc_factor_mat_solver_package spooles  this produces a residual error of
>> O(1e-11), all is still well.
>>
>> I then switch over to -pc_factor_mat_solver_package superlu_dist and the
>> residual error comes back as 22.6637!  Something seems very wrong.
>>
>> My build is perfectly vanilla:
>>
>> export PETSC_DIR=/Users/sg/petsc-3.3-p5/
>> export PETSC_ARCH=intel
>>
>> ./configure --with-cc=icc --with-fc=ifort  \
>> -download-{spooles,parmetis,superlu_dist,prometheus,mpich,ml,hypre,metis}
>>
>> make PETSC_DIR=/Users/sg/petsc-3.3-p5/ PETSC_ARCH=intel all
>> make PETSC_DIR=/Users/sg/petsc-3.3-p5/ PETSC_ARCH=intel test
>>
>> -sanjay
>>
>
>
> --
> -----------------------------------------------
> Sanjay Govindjee, PhD, PE
> Professor of Civil Engineering
> Vice Chair for Academic Affairs
>
> 779 Davis Hall
> Structural Engineering, Mechanics and Materials
> Department of Civil Engineering
> University of California
> Berkeley, CA 94720-1710
>
> Voice:  +1 510 642 6060
> FAX:    +1 510 643 5264s_g at berkeley.eduhttp://www.ce.berkeley.edu/~sanjay
> -----------------------------------------------
>
> New Books:
>
> Engineering Mechanics of Deformable
> Solids: A Presentation with Exerciseshttp://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641http://ukcatalogue.oup.com/product/9780199651641.dohttp://amzn.com/0199651647
>
>
> Engineering Mechanics 3 (Dynamics)http://www.springer.com/materials/mechanics/book/978-3-642-14018-1http://amzn.com/3642140181
>
> -----------------------------------------------
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121223/36427108/attachment.html>

From s_g at berkeley.edu  Sun Dec 23 20:15:34 2012
From: s_g at berkeley.edu (Sanjay Govindjee)
Date: Sun, 23 Dec 2012 18:15:34 -0800
Subject: [petsc-users] Using superlu_dist in a direct solve
In-Reply-To: <CAM9tzS=U92egvJgeNPKSLWAeznwUD72BXWf72VDBiXfCXZzMcQ@mail.gmail.com>
References: <50D7A664.6080802@berkeley.edu>
	<CAM9tzSnY9o6EckTmo_GkD6QxHjj3o54ragfJotbacPkr8fKfmw@mail.gmail.com>
	<50D7AB15.5040606@berkeley.edu>
	<CAM9tzS=U92egvJgeNPKSLWAeznwUD72BXWf72VDBiXfCXZzMcQ@mail.gmail.com>
Message-ID: <50D7BAC6.2050807@berkeley.edu>

Sorry for the confusion.  I thought I was clear.  Here is the make line 
I was running.


         -@${MPIEXEC} -n 2 ./ex6 -ksp_type preonly  -pc_type lu 
-pc_factor_mat_solver_package superlu_dist -options_left no \
            -f arco1 > ex6_1.tmp 2>&1; \
            if (${DIFF} output/ex6_1.out ex6_1.tmp) then true; \
            else echo ${PWD} ; echo "Possible problem with with ex6_1, 
diffs above \n========================================="; fi; \
            ${RM} -f ex6_1.tmp

If you change superlu_dist to spooles it works just fine as well as any 
other iterative methods you care to try.  The matrix arcos1 was 
downloaded as per the instructions in the makefile.

I will try reproducing the superlu_dist error with 
snes/examples/tutorials/ex5 now.  (fyi under snes/examples/tests/output 
the files ex5_1.out and ex5_2.out are missing one can not run the test 
out of the box).

-sanjay


On 12/23/12 6:07 PM, Jed Brown wrote:
> You didn't say what options you were running ex6 with, but with the 
> options used for the tests, I see
>
> ~/petsc/src/ksp/ksp/examples/tests$ mpirun.hydra -n 2 ./ex6 -f 
> ~/petsc/datafiles/matrices/arco1 -pc_type lu 
> -pc_factor_mat_solver_package superlu_dist
> Number of iterations =   1
> Residual norm = 2.23439e-11
>
>
> You need to give precise instructions for how to reproduce the 
> behavior you are seeing.
>
> Also, for experimenting with matrices read from files, we prefer 
> src/ksp/ksp/examples/tutorials/ex10.c because it is better commented 
> and has more features.
>
>
> On Sun, Dec 23, 2012 at 7:08 PM, Sanjay Govindjee <s_g at berkeley.edu 
> <mailto:s_g at berkeley.edu>> wrote:
>
>     Not sure what you mean by where is your matrix?  I am simply
>     running ex6 in the ksp/examples/tests directory.
>
>     The reason I ran this test is because I was seeing the same
>     behavior with my finite element code (on perfectly benign problems).
>
>     Is there a built-in test that you use to check that superlu_dist
>     is working properly with petsc?
>     i.e. something you know that works with with petsc 3.3-p5?
>
>     -sanjay
>
>
>
>     On 12/23/12 4:56 PM, Jed Brown wrote:
>>     Where is your matrix? It might be ending up with a very bad
>>     pivot. If the problem can be reproduced, it should be reported to
>>     the SuperLU_DIST developers to fix. (Note that we do not see this
>>     with other matrices.) You can also try MUMPS.
>>
>>
>>     On Sun, Dec 23, 2012 at 6:48 PM, Sanjay Govindjee
>>     <s_g at berkeley.edu <mailto:s_g at berkeley.edu>> wrote:
>>
>>         I wanted to use SuperLU Dist to perform a direct solve but
>>         seem to be encountering
>>         a problem.  I was wonder if this is a know issue and if there
>>         is a solution for it.
>>
>>         The problem is easily observed using ex6.c in
>>         src/ksp/ksp/examples/tests.
>>
>>         Out of the box: make runex6 produces a residual error of
>>         O(1e-11), all is well.
>>
>>         I then changed the run to run on two processors and add the flag
>>         -pc_factor_mat_solver_package spooles  this produces a
>>         residual error of O(1e-11), all is still well.
>>
>>         I then switch over to -pc_factor_mat_solver_package
>>         superlu_dist and the
>>         residual error comes back as 22.6637! Something seems very wrong.
>>
>>         My build is perfectly vanilla:
>>
>>         export PETSC_DIR=/Users/sg/petsc-3.3-p5/
>>         export PETSC_ARCH=intel
>>
>>         ./configure --with-cc=icc --with-fc=ifort  \
>>         -download-{spooles,parmetis,superlu_dist,prometheus,mpich,ml,hypre,metis}
>>
>>         make PETSC_DIR=/Users/sg/petsc-3.3-p5/ PETSC_ARCH=intel all
>>         make PETSC_DIR=/Users/sg/petsc-3.3-p5/ PETSC_ARCH=intel test
>>
>>         -sanjay
>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121223/41f58c2b/attachment.html>

From jedbrown at mcs.anl.gov  Sun Dec 23 20:26:17 2012
From: jedbrown at mcs.anl.gov (Jed Brown)
Date: Sun, 23 Dec 2012 20:26:17 -0600
Subject: [petsc-users] Using superlu_dist in a direct solve
In-Reply-To: <50D7BAC6.2050807@berkeley.edu>
References: <50D7A664.6080802@berkeley.edu>
	<CAM9tzSnY9o6EckTmo_GkD6QxHjj3o54ragfJotbacPkr8fKfmw@mail.gmail.com>
	<50D7AB15.5040606@berkeley.edu>
	<CAM9tzS=U92egvJgeNPKSLWAeznwUD72BXWf72VDBiXfCXZzMcQ@mail.gmail.com>
	<50D7BAC6.2050807@berkeley.edu>
Message-ID: <CAM9tzS=tTA9H4w+owUBONT79pFnR-NZ+91PiQbJnn_9thjUWyA@mail.gmail.com>

On Sun, Dec 23, 2012 at 8:15 PM, Sanjay Govindjee <s_g at berkeley.edu> wrote:

>  Sorry for the confusion.  I thought I was clear.  Here is the make line I
> was running.
>
>
>         -@${MPIEXEC} -n 2 ./ex6 -ksp_type preonly  -pc_type lu
> -pc_factor_mat_solver_package superlu_dist -options_left no \
>            -f arco1 > ex6_1.tmp 2>&1; \
>            if (${DIFF} output/ex6_1.out ex6_1.tmp) then true; \
>            else echo ${PWD} ; echo "Possible problem with with ex6_1,
> diffs above \n========================================="; fi; \
>            ${RM} -f ex6_1.tmp
>
> If you change superlu_dist to spooles it works just fine as well as any
> other iterative methods you care to try.  The matrix arcos1 was downloaded
> as per the instructions in the makefile.
>

I cannot reproduce your problem. Do you have a build with a different
compiler (like GCC)? Also, what BLAS/LAPACK is being used? (You can send
configure.log to petsc-maint at mcs.anl.gov.)


> I will try reproducing the superlu_dist error with
> snes/examples/tutorials/ex5 now.
>

This is the file Matt suggested.


> (fyi under snes/examples/tests/output the files ex5_1.out and ex5_2.out
> are missing one can not run the test out of the box).
>

Heh, this has been missing since the beginning of time (revision 0). I'll
add it.


>
> -sanjay
>
>
>
> On 12/23/12 6:07 PM, Jed Brown wrote:
>
> You didn't say what options you were running ex6 with, but with the
> options used for the tests, I see
>
> ~/petsc/src/ksp/ksp/examples/tests$ mpirun.hydra -n 2 ./ex6 -f
> ~/petsc/datafiles/matrices/arco1 -pc_type lu -pc_factor_mat_solver_package
> superlu_dist
> Number of iterations =   1
> Residual norm = 2.23439e-11
>
>
> You need to give precise instructions for how to reproduce the behavior
> you are seeing.
>
> Also, for experimenting with matrices read from files, we prefer
> src/ksp/ksp/examples/tutorials/ex10.c because it is better commented and
> has more features.
>
>
> On Sun, Dec 23, 2012 at 7:08 PM, Sanjay Govindjee <s_g at berkeley.edu>wrote:
>
>>  Not sure what you mean by where is your matrix?  I am simply running ex6
>> in the ksp/examples/tests directory.
>>
>> The reason I ran this test is because I was seeing the same behavior with
>> my finite element code (on perfectly benign problems).
>>
>> Is there a built-in test that you use to check that superlu_dist is
>> working properly with petsc?
>> i.e. something you know that works with with petsc 3.3-p5?
>>
>> -sanjay
>>
>>
>>
>> On 12/23/12 4:56 PM, Jed Brown wrote:
>>
>> Where is your matrix? It might be ending up with a very bad pivot. If the
>> problem can be reproduced, it should be reported to the SuperLU_DIST
>> developers to fix. (Note that we do not see this with other matrices.) You
>> can also try MUMPS.
>>
>>
>> On Sun, Dec 23, 2012 at 6:48 PM, Sanjay Govindjee <s_g at berkeley.edu>wrote:
>>
>>>  I wanted to use SuperLU Dist to perform a direct solve but seem to be
>>> encountering
>>> a problem.  I was wonder if this is a know issue and if there is a
>>> solution for it.
>>>
>>> The problem is easily observed using ex6.c in src/ksp/ksp/examples/tests.
>>>
>>> Out of the box: make runex6 produces a residual error of O(1e-11), all
>>> is well.
>>>
>>> I then changed the run to run on two processors and add the flag
>>> -pc_factor_mat_solver_package spooles  this produces a residual error of
>>> O(1e-11), all is still well.
>>>
>>> I then switch over to -pc_factor_mat_solver_package superlu_dist and the
>>> residual error comes back as 22.6637!  Something seems very wrong.
>>>
>>> My build is perfectly vanilla:
>>>
>>> export PETSC_DIR=/Users/sg/petsc-3.3-p5/
>>> export PETSC_ARCH=intel
>>>
>>> ./configure --with-cc=icc --with-fc=ifort  \
>>> -download-{spooles,parmetis,superlu_dist,prometheus,mpich,ml,hypre,metis}
>>>
>>> make PETSC_DIR=/Users/sg/petsc-3.3-p5/ PETSC_ARCH=intel all
>>> make PETSC_DIR=/Users/sg/petsc-3.3-p5/ PETSC_ARCH=intel test
>>>
>>> -sanjay
>>>
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121223/fbcc1b7b/attachment-0001.html>

From s_g at berkeley.edu  Sun Dec 23 20:37:39 2012
From: s_g at berkeley.edu (Sanjay Govindjee)
Date: Sun, 23 Dec 2012 18:37:39 -0800
Subject: [petsc-users] Using superlu_dist in a direct solve
In-Reply-To: <CAMYG4GmuhF7Jh7os3j9bLubuoMGnerxtN+tU=gzpeYmX+3UUXA@mail.gmail.com>
References: <50D7A664.6080802@berkeley.edu>
	<CAM9tzSnY9o6EckTmo_GkD6QxHjj3o54ragfJotbacPkr8fKfmw@mail.gmail.com>
	<50D7AB15.5040606@berkeley.edu>
	<CAMYG4GmuhF7Jh7os3j9bLubuoMGnerxtN+tU=gzpeYmX+3UUXA@mail.gmail.com>
Message-ID: <50D7BFF3.3030909@berkeley.edu>

I decided to go with ksp/ksp/exampeles/tutorials/ex2.c; I was unsure how 
to convert the run lines for snes/examples/ex5.c to work with a direct 
solver as I am not versed in SNES options.

Notwithstanding something strange is happening only on select examples.  
With ksp/ksp/exampeles/tutorials/ex2.c and the run line:

-@${MPIEXEC} -n 2 ./ex2 -ksp_monitor_short -m 20 -n 20 -ksp_type preonly 
-pc_type lu -pc_factor_mat_solver_package superlu_dist

I get good results (of the order):

Norm of error 1.85464e-14 iterations 1

using both superlu_dist and spooles.

My BLAS/LAPACK: -llapack -lblas (so native to my machine).

If you can guide me on a run line for the snes ex5.c I can try that 
too.  I'll also try to construct a GCC build later to see if that is an 
issue.

-sanjay


On 12/23/12 5:58 PM, Matthew Knepley wrote:
> On Sun, Dec 23, 2012 at 8:08 PM, Sanjay Govindjee <s_g at berkeley.edu 
> <mailto:s_g at berkeley.edu>> wrote:
>
>     Not sure what you mean by where is your matrix?  I am simply
>     running ex6 in the ksp/examples/tests directory.
>
>     The reason I ran this test is because I was seeing the same
>     behavior with my finite element code (on perfectly benign problems).
>
>     Is there a built-in test that you use to check that superlu_dist
>     is working properly with petsc?
>     i.e. something you know that works with with petsc 3.3-p5?
>
>
> 1) Run it on a SNES ex5 (or KSP ex2), which is a nice Laplacian
>
> 2) Compare with MUMPS
>
>    Matt
>
>     -sanjay
>
>
>
>     On 12/23/12 4:56 PM, Jed Brown wrote:
>>     Where is your matrix? It might be ending up with a very bad
>>     pivot. If the problem can be reproduced, it should be reported to
>>     the SuperLU_DIST developers to fix. (Note that we do not see this
>>     with other matrices.) You can also try MUMPS.
>>
>>
>>     On Sun, Dec 23, 2012 at 6:48 PM, Sanjay Govindjee
>>     <s_g at berkeley.edu <mailto:s_g at berkeley.edu>> wrote:
>>
>>         I wanted to use SuperLU Dist to perform a direct solve but
>>         seem to be encountering
>>         a problem.  I was wonder if this is a know issue and if there
>>         is a solution for it.
>>
>>         The problem is easily observed using ex6.c in
>>         src/ksp/ksp/examples/tests.
>>
>>         Out of the box: make runex6 produces a residual error of
>>         O(1e-11), all is well.
>>
>>         I then changed the run to run on two processors and add the flag
>>         -pc_factor_mat_solver_package spooles this produces a
>>         residual error of O(1e-11), all is still well.
>>
>>         I then switch over to -pc_factor_mat_solver_package
>>         superlu_dist and the
>>         residual error comes back as 22.6637! Something seems very wrong.
>>
>>         My build is perfectly vanilla:
>>
>>         export PETSC_DIR=/Users/sg/petsc-3.3-p5/
>>         export PETSC_ARCH=intel
>>
>>         ./configure --with-cc=icc --with-fc=ifort \
>>         -download-{spooles,parmetis,superlu_dist,prometheus,mpich,ml,hypre,metis}
>>
>>         make PETSC_DIR=/Users/sg/petsc-3.3-p5/ PETSC_ARCH=intel all
>>         make PETSC_DIR=/Users/sg/petsc-3.3-p5/ PETSC_ARCH=intel test
>>
>>         -sanjay
>>
>>
>
>     -- 
>     -----------------------------------------------
>     Sanjay Govindjee, PhD, PE
>     Professor of Civil Engineering
>     Vice Chair for Academic Affairs
>
>     779 Davis Hall
>     Structural Engineering, Mechanics and Materials
>     Department of Civil Engineering
>     University of California
>     Berkeley, CA 94720-1710
>
>     Voice:+1 510 642 6060  <tel:%2B1%20510%20642%206060>
>     FAX:+1 510 643 5264  <tel:%2B1%20510%20643%205264>
>     s_g at berkeley.edu  <mailto:s_g at berkeley.edu>
>     http://www.ce.berkeley.edu/~sanjay  <http://www.ce.berkeley.edu/%7Esanjay>
>     -----------------------------------------------
>
>     New Books:
>
>     Engineering Mechanics of Deformable
>     Solids: A Presentation with Exercises
>     http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641
>     http://ukcatalogue.oup.com/product/9780199651641.do
>     http://amzn.com/0199651647
>
>
>     Engineering Mechanics 3 (Dynamics)
>     http://www.springer.com/materials/mechanics/book/978-3-642-14018-1
>     http://amzn.com/3642140181
>
>     -----------------------------------------------
>
>
>
>
> -- 
> What most experimenters take for granted before they begin their 
> experiments is infinitely more interesting than any results to which 
> their experiments lead.
> -- Norbert Wiener

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121223/c39476b7/attachment.html>

From knepley at gmail.com  Sun Dec 23 20:42:56 2012
From: knepley at gmail.com (Matthew Knepley)
Date: Sun, 23 Dec 2012 21:42:56 -0500
Subject: [petsc-users] Using superlu_dist in a direct solve
In-Reply-To: <50D7BFF3.3030909@berkeley.edu>
References: <50D7A664.6080802@berkeley.edu>
	<CAM9tzSnY9o6EckTmo_GkD6QxHjj3o54ragfJotbacPkr8fKfmw@mail.gmail.com>
	<50D7AB15.5040606@berkeley.edu>
	<CAMYG4GmuhF7Jh7os3j9bLubuoMGnerxtN+tU=gzpeYmX+3UUXA@mail.gmail.com>
	<50D7BFF3.3030909@berkeley.edu>
Message-ID: <CAMYG4GmW7FfEqvV6YEPygXFjpxjD2sDA9fkroAJ+DXXbjiW9nw@mail.gmail.com>

On Sun, Dec 23, 2012 at 9:37 PM, Sanjay Govindjee <s_g at berkeley.edu> wrote:

>  I decided to go with ksp/ksp/exampeles/tutorials/ex2.c; I was unsure how
> to convert the run lines for snes/examples/ex5.c to work with a direct
> solver as I am not versed in SNES options.
>
> Notwithstanding something strange is happening only on select examples.
> With ksp/ksp/exampeles/tutorials/ex2.c and the run line:
>
> -@${MPIEXEC} -n 2 ./ex2 -ksp_monitor_short -m 20 -n 20 -ksp_type preonly
> -pc_type lu -pc_factor_mat_solver_package superlu_dist
>
> I get good results (of the order):
>
> Norm of error 1.85464e-14 iterations 1
>
> using both superlu_dist and spooles.
>
> My BLAS/LAPACK: -llapack -lblas (so native to my machine).
>
> If you can guide me on a run line for the snes ex5.c I can try that too.
> I'll also try to construct a GCC build later to see if that is an issue.
>

Same line on ex5, but ex2 is good enough. However, it will not tell us
anything new. Try another build.

   Matt


> -sanjay
>
>
> On 12/23/12 5:58 PM, Matthew Knepley wrote:
>
> On Sun, Dec 23, 2012 at 8:08 PM, Sanjay Govindjee <s_g at berkeley.edu>wrote:
>
>>  Not sure what you mean by where is your matrix?  I am simply running ex6
>> in the ksp/examples/tests directory.
>>
>> The reason I ran this test is because I was seeing the same behavior with
>> my finite element code (on perfectly benign problems).
>>
>> Is there a built-in test that you use to check that superlu_dist is
>> working properly with petsc?
>> i.e. something you know that works with with petsc 3.3-p5?
>>
>
>  1) Run it on a SNES ex5 (or KSP ex2), which is a nice Laplacian
>
>  2) Compare with MUMPS
>
>     Matt
>
>
>>  -sanjay
>>
>>
>>
>> On 12/23/12 4:56 PM, Jed Brown wrote:
>>
>> Where is your matrix? It might be ending up with a very bad pivot. If the
>> problem can be reproduced, it should be reported to the SuperLU_DIST
>> developers to fix. (Note that we do not see this with other matrices.) You
>> can also try MUMPS.
>>
>>
>> On Sun, Dec 23, 2012 at 6:48 PM, Sanjay Govindjee <s_g at berkeley.edu>wrote:
>>
>>>  I wanted to use SuperLU Dist to perform a direct solve but seem to be
>>> encountering
>>> a problem.  I was wonder if this is a know issue and if there is a
>>> solution for it.
>>>
>>> The problem is easily observed using ex6.c in src/ksp/ksp/examples/tests.
>>>
>>> Out of the box: make runex6 produces a residual error of O(1e-11), all
>>> is well.
>>>
>>> I then changed the run to run on two processors and add the flag
>>> -pc_factor_mat_solver_package spooles  this produces a residual error of
>>> O(1e-11), all is still well.
>>>
>>> I then switch over to -pc_factor_mat_solver_package superlu_dist and the
>>> residual error comes back as 22.6637!  Something seems very wrong.
>>>
>>> My build is perfectly vanilla:
>>>
>>> export PETSC_DIR=/Users/sg/petsc-3.3-p5/
>>> export PETSC_ARCH=intel
>>>
>>> ./configure --with-cc=icc --with-fc=ifort  \
>>> -download-{spooles,parmetis,superlu_dist,prometheus,mpich,ml,hypre,metis}
>>>
>>> make PETSC_DIR=/Users/sg/petsc-3.3-p5/ PETSC_ARCH=intel all
>>> make PETSC_DIR=/Users/sg/petsc-3.3-p5/ PETSC_ARCH=intel test
>>>
>>> -sanjay
>>>
>>
>>
>>  --
>> -----------------------------------------------
>> Sanjay Govindjee, PhD, PE
>> Professor of Civil Engineering
>> Vice Chair for Academic Affairs
>>
>> 779 Davis Hall
>> Structural Engineering, Mechanics and Materials
>> Department of Civil Engineering
>> University of California
>> Berkeley, CA 94720-1710
>>
>> Voice:  +1 510 642 6060
>> FAX:    +1 510 643 5264s_g at berkeley.eduhttp://www.ce.berkeley.edu/~sanjay
>> -----------------------------------------------
>>
>> New Books:
>>
>> Engineering Mechanics of Deformable
>> Solids: A Presentation with Exerciseshttp://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641http://ukcatalogue.oup.com/product/9780199651641.dohttp://amzn.com/0199651647
>>
>>
>> Engineering Mechanics 3 (Dynamics)http://www.springer.com/materials/mechanics/book/978-3-642-14018-1http://amzn.com/3642140181
>>
>> -----------------------------------------------
>>
>>
>
>
>  --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121223/3cfc8024/attachment.html>

From hzhang at mcs.anl.gov  Mon Dec 24 10:58:54 2012
From: hzhang at mcs.anl.gov (Hong Zhang)
Date: Mon, 24 Dec 2012 10:58:54 -0600
Subject: [petsc-users] Using superlu_dist in a direct solve
In-Reply-To: <CAMYG4GmW7FfEqvV6YEPygXFjpxjD2sDA9fkroAJ+DXXbjiW9nw@mail.gmail.com>
References: <50D7A664.6080802@berkeley.edu>
	<CAM9tzSnY9o6EckTmo_GkD6QxHjj3o54ragfJotbacPkr8fKfmw@mail.gmail.com>
	<50D7AB15.5040606@berkeley.edu>
	<CAMYG4GmuhF7Jh7os3j9bLubuoMGnerxtN+tU=gzpeYmX+3UUXA@mail.gmail.com>
	<50D7BFF3.3030909@berkeley.edu>
	<CAMYG4GmW7FfEqvV6YEPygXFjpxjD2sDA9fkroAJ+DXXbjiW9nw@mail.gmail.com>
Message-ID: <CAGCphBudSjLB_rAYz9h+vryq-dzj545ai37hHV7-2jTTYcT=Tw@mail.gmail.com>

Sanjay,
Which version of superlu_dist do you use?
I configured my petsc-3.3 with '--download-superlu_dist' which
installs SuperLU_DIST_3.1.
Then I get petsc-3.3/src/ksp/ksp/examples/tests
mpiexec -n 2 ./ex6 -ksp_type preonly -pc_type lu
-pc_factor_mat_solver_package superlu_dist -options_left no -f
$D/arco1
Number of iterations =   1
Residual norm = 2.00484e-11

Hong


On Sun, Dec 23, 2012 at 8:42 PM, Matthew Knepley <knepley at gmail.com> wrote:
>
> On Sun, Dec 23, 2012 at 9:37 PM, Sanjay Govindjee <s_g at berkeley.edu> wrote:
>>
>> I decided to go with ksp/ksp/exampeles/tutorials/ex2.c; I was unsure how
>> to convert the run lines for snes/examples/ex5.c to work with a direct
>> solver as I am not versed in SNES options.
>>
>> Notwithstanding something strange is happening only on select examples.
>> With ksp/ksp/exampeles/tutorials/ex2.c and the run line:
>>
>> -@${MPIEXEC} -n 2 ./ex2 -ksp_monitor_short -m 20 -n 20 -ksp_type preonly
>> -pc_type lu -pc_factor_mat_solver_package superlu_dist
>>
>> I get good results (of the order):
>>
>> Norm of error 1.85464e-14 iterations 1
>>
>> using both superlu_dist and spooles.
>>
>> My BLAS/LAPACK: -llapack -lblas (so native to my machine).
>>
>> If you can guide me on a run line for the snes ex5.c I can try that too.
>> I'll also try to construct a GCC build later to see if that is an issue.
>
>
> Same line on ex5, but ex2 is good enough. However, it will not tell us
> anything new. Try another build.
>
>    Matt
>
>>
>> -sanjay
>>
>>
>> On 12/23/12 5:58 PM, Matthew Knepley wrote:
>>
>> On Sun, Dec 23, 2012 at 8:08 PM, Sanjay Govindjee <s_g at berkeley.edu>
>> wrote:
>>>
>>> Not sure what you mean by where is your matrix?  I am simply running ex6
>>> in the ksp/examples/tests directory.
>>>
>>> The reason I ran this test is because I was seeing the same behavior with
>>> my finite element code (on perfectly benign problems).
>>>
>>> Is there a built-in test that you use to check that superlu_dist is
>>> working properly with petsc?
>>> i.e. something you know that works with with petsc 3.3-p5?
>>
>>
>> 1) Run it on a SNES ex5 (or KSP ex2), which is a nice Laplacian
>>
>> 2) Compare with MUMPS
>>
>>    Matt
>>
>>>
>>> -sanjay
>>>
>>>
>>>
>>> On 12/23/12 4:56 PM, Jed Brown wrote:
>>>
>>> Where is your matrix? It might be ending up with a very bad pivot. If the
>>> problem can be reproduced, it should be reported to the SuperLU_DIST
>>> developers to fix. (Note that we do not see this with other matrices.) You
>>> can also try MUMPS.
>>>
>>>
>>> On Sun, Dec 23, 2012 at 6:48 PM, Sanjay Govindjee <s_g at berkeley.edu>
>>> wrote:
>>>>
>>>> I wanted to use SuperLU Dist to perform a direct solve but seem to be
>>>> encountering
>>>> a problem.  I was wonder if this is a know issue and if there is a
>>>> solution for it.
>>>>
>>>> The problem is easily observed using ex6.c in
>>>> src/ksp/ksp/examples/tests.
>>>>
>>>> Out of the box: make runex6 produces a residual error of O(1e-11), all
>>>> is well.
>>>>
>>>> I then changed the run to run on two processors and add the flag
>>>> -pc_factor_mat_solver_package spooles  this produces a residual error of
>>>> O(1e-11), all is still well.
>>>>
>>>> I then switch over to -pc_factor_mat_solver_package superlu_dist and the
>>>> residual error comes back as 22.6637!  Something seems very wrong.
>>>>
>>>> My build is perfectly vanilla:
>>>>
>>>> export PETSC_DIR=/Users/sg/petsc-3.3-p5/
>>>> export PETSC_ARCH=intel
>>>>
>>>> ./configure --with-cc=icc --with-fc=ifort  \
>>>> -download-{spooles,parmetis,superlu_dist,prometheus,mpich,ml,hypre,metis}
>>>>
>>>> make PETSC_DIR=/Users/sg/petsc-3.3-p5/ PETSC_ARCH=intel all
>>>> make PETSC_DIR=/Users/sg/petsc-3.3-p5/ PETSC_ARCH=intel test
>>>>
>>>> -sanjay
>>>
>>>
>>>
>>> --
>>> -----------------------------------------------
>>> Sanjay Govindjee, PhD, PE
>>> Professor of Civil Engineering
>>> Vice Chair for Academic Affairs
>>>
>>> 779 Davis Hall
>>> Structural Engineering, Mechanics and Materials
>>> Department of Civil Engineering
>>> University of California
>>> Berkeley, CA 94720-1710
>>>
>>> Voice:  +1 510 642 6060
>>> FAX:    +1 510 643 5264
>>> s_g at berkeley.edu
>>> http://www.ce.berkeley.edu/~sanjay
>>> -----------------------------------------------
>>>
>>> New Books:
>>>
>>> Engineering Mechanics of Deformable
>>> Solids: A Presentation with Exercises
>>>
>>> http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641
>>> http://ukcatalogue.oup.com/product/9780199651641.do
>>> http://amzn.com/0199651647
>>>
>>>
>>> Engineering Mechanics 3 (Dynamics)
>>> http://www.springer.com/materials/mechanics/book/978-3-642-14018-1
>>> http://amzn.com/3642140181
>>>
>>> -----------------------------------------------
>>
>>
>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>>
>
>
>
> --
> What most experimenters take for granted before they begin their experiments
> is infinitely more interesting than any results to which their experiments
> lead.
> -- Norbert Wiener

From abarua at iit.edu  Wed Dec 26 01:00:50 2012
From: abarua at iit.edu (amlan barua)
Date: Wed, 26 Dec 2012 01:00:50 -0600
Subject: [petsc-users] Question on TS
Message-ID: <CA+Z=SOqum=fZySAEmeR_rbrfwGAEqAeC6TmbRGTvhbmaiXOUyA@mail.gmail.com>

Hi,
Greetings to the team! I am currently using PETSc for my research. Here is
a brief description of my problem and my query
a) I have a set a points distributed on a 3 dimensional lattice.
b) Corresponding to each point in this set, 7 odes are defined.
c) Of these 7 odes, 6 are uncoupled but one is coupled to nearest
neighbors.
d) To integrate the odes I am using PETSc's DMDA and TS. But my application
needs implicit as well as locally high order solver. I am looking for an
implicit RK4 type method. Does PETSc have an IRK4 support or equivalent?
e) Suppose I want to build my own implicit time stepper. Should I imitate
ex2.c of SNES solver?
Thanks
Amlan
IISER Pune, India
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121226/00066a54/attachment.html>

From jedbrown at mcs.anl.gov  Wed Dec 26 10:24:57 2012
From: jedbrown at mcs.anl.gov (Jed Brown)
Date: Wed, 26 Dec 2012 10:24:57 -0600
Subject: [petsc-users] Question on TS
In-Reply-To: <CA+Z=SOqum=fZySAEmeR_rbrfwGAEqAeC6TmbRGTvhbmaiXOUyA@mail.gmail.com>
References: <CA+Z=SOqum=fZySAEmeR_rbrfwGAEqAeC6TmbRGTvhbmaiXOUyA@mail.gmail.com>
Message-ID: <CAM9tzSmYBvkrNXyV6KxMbB0XN-AhjBu9SYc-ZDpvAzBM=UKFmw@mail.gmail.com>

On Wed, Dec 26, 2012 at 1:00 AM, amlan barua <abarua at iit.edu> wrote:

> Hi,
> Greetings to the team! I am currently using PETSc for my research. Here is
> a brief description of my problem and my query
> a) I have a set a points distributed on a 3 dimensional lattice.
> b) Corresponding to each point in this set, 7 odes are defined.
> c) Of these 7 odes, 6 are uncoupled but one is coupled to nearest
> neighbors.
>

I suggest not optimizing for "missing" coupling to start with. We can do
the optimization in the solver, perhaps by splitting the DMDA into the
local and coupled parts.


> d) To integrate the odes I am using PETSc's DMDA and TS. But my
> application needs implicit as well as locally high order solver. I am
> looking for an implicit RK4 type method. Does PETSc have an IRK4 support or
> equivalent?
>

If you are happy with a diagonally implicit method, you can use TSARKIMEX
(these integrators can be IMEX, but can also do any diagonally implicit
method).

If you want a fully implicit RK (like Gauss, Radau IIA, etc) then all
stages are coupled together. Those methods are not currently implemented in
PETSc, though you could implement it either as a new TS implementation
(good for code reuse; you can do this outside of PETSc, but the code you
write is like library code) or manually using SNES (not reusable).


> e) Suppose I want to build my own implicit time stepper. Should I imitate
> ex2.c of SNES solver?
> Thanks
> Amlan
> IISER Pune, India
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121226/8d64e9fd/attachment.html>

From bsmith at mcs.anl.gov  Wed Dec 26 11:02:58 2012
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Wed, 26 Dec 2012 11:02:58 -0600
Subject: [petsc-users] Question on TS
In-Reply-To: <CAM9tzSmYBvkrNXyV6KxMbB0XN-AhjBu9SYc-ZDpvAzBM=UKFmw@mail.gmail.com>
References: <CA+Z=SOqum=fZySAEmeR_rbrfwGAEqAeC6TmbRGTvhbmaiXOUyA@mail.gmail.com>
	<CAM9tzSmYBvkrNXyV6KxMbB0XN-AhjBu9SYc-ZDpvAzBM=UKFmw@mail.gmail.com>
Message-ID: <598DFAB9-9075-4EB1-B1A7-26CCBE4414F1@mcs.anl.gov>


On Dec 26, 2012, at 10:24 AM, Jed Brown <jedbrown at mcs.anl.gov> wrote:

> On Wed, Dec 26, 2012 at 1:00 AM, amlan barua <abarua at iit.edu> wrote:
> Hi,
> Greetings to the team! I am currently using PETSc for my research. Here is a brief description of my problem and my query 
> a) I have a set a points distributed on a 3 dimensional lattice. 
> b) Corresponding to each point in this set, 7 odes are defined. 
> c) Of these 7 odes, 6 are uncoupled but one is coupled to nearest neighbors. 
> 
> I suggest not optimizing for "missing" coupling to start with. We can do the optimization in the solver, perhaps by splitting the DMDA into the local and coupled parts.

   I agree with Jed here. Coincidently I am working on a similar problem but with thousands of ODEs (mostly decoupled).  You can use DMDASetBlockFills(), the ofill parameter to indicate exactly what fields are coupled to neighbors and which are not, this reduces the unneeded zero Jacobian entries (you can also use the dfill parameter to reduce unneeded zero entries in the 7 by 7 block). Eventually we'll use the same information to reduce the ghost point communication also.

   Barry

>  
> d) To integrate the odes I am using PETSc's DMDA and TS. But my application needs implicit as well as locally high order solver. I am looking for an implicit RK4 type method. Does PETSc have an IRK4 support or equivalent?
> 
> If you are happy with a diagonally implicit method, you can use TSARKIMEX (these integrators can be IMEX, but can also do any diagonally implicit method).
> 
> If you want a fully implicit RK (like Gauss, Radau IIA, etc) then all stages are coupled together. Those methods are not currently implemented in PETSc, though you could implement it either as a new TS implementation (good for code reuse; you can do this outside of PETSc, but the code you write is like library code) or manually using SNES (not reusable).
>  
> e) Suppose I want to build my own implicit time stepper. Should I imitate ex2.c of SNES solver?
> Thanks
> Amlan
> IISER Pune, India
> 


From z240w014 at ku.edu  Wed Dec 26 12:05:29 2012
From: z240w014 at ku.edu (Zhenglun (Alan) Wei)
Date: Wed, 26 Dec 2012 12:05:29 -0600
Subject: [petsc-users] A quick question on DMDACreate3d
Message-ID: <50DB3C69.9060806@ku.edu>

Dear folks,
      I have a quick question on the DMDACreate3d.
      In the manual, it says that the input format of this function is:

PetscErrorCode  DMDACreate3d(MPI_Comm comm,DMDABoundaryType bx,DMDABoundaryType by,DMDABoundaryType bz,DMDAStencilType stencil_type,PetscInt M,
                PetscInt N,PetscInt P,PetscInt m,PetscInt n,PetscInt p,PetscInt dof,PetscInt s,const PetscInt lx[],const PetscInt ly[],const PetscInt lz[],DM *da)


      Now, I'm trying to manually define the "arrays containing the 
number of nodes in each cell along the x, y, and z coordinates". 
Therefore, my focus turns to 'lx[]', 'ly[]' and 'lz[]'. I suppose that 
they're not simply just three integers; they may be three integer type 
arrays, as I guess. However, I checked all examples listed for this 
function. None of them teaches me how to implement this three parameters 
except 'PETSC_NULL'. Could you please provide me an extra example to 
demonstrate how to use DMDACreate3d or DMDACreate2d with non-null 
'lx[]', 'ly[]' and 'lz[]'.
      Or, a demonstration in 1D would be a good example. Say, I have a 
1D uniform mesh; the number of grid in x-direction is 300. I want to use 
4 processes to evenly divide this mesh. What should I input for 'lx[]' 
for each process?

thank you so much and Happy New Year!! :)
Alan

   


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121226/71659317/attachment.html>

From abarua at iit.edu  Wed Dec 26 12:25:28 2012
From: abarua at iit.edu (amlan barua)
Date: Wed, 26 Dec 2012 12:25:28 -0600
Subject: [petsc-users] Question on TS
In-Reply-To: <598DFAB9-9075-4EB1-B1A7-26CCBE4414F1@mcs.anl.gov>
References: <CA+Z=SOqum=fZySAEmeR_rbrfwGAEqAeC6TmbRGTvhbmaiXOUyA@mail.gmail.com>
	<CAM9tzSmYBvkrNXyV6KxMbB0XN-AhjBu9SYc-ZDpvAzBM=UKFmw@mail.gmail.com>
	<598DFAB9-9075-4EB1-B1A7-26CCBE4414F1@mcs.anl.gov>
Message-ID: <CA+Z=SOpkazz3KgHOj1H_wq4JRhe6uLuTMDzSr+zf3oYXxn1U9g@mail.gmail.com>

Hi,
Thanks to Barry and Jed. I might come back later with few other questions.
Amlan


On Wed, Dec 26, 2012 at 11:02 AM, Barry Smith <bsmith at mcs.anl.gov> wrote:

>
> On Dec 26, 2012, at 10:24 AM, Jed Brown <jedbrown at mcs.anl.gov> wrote:
>
> > On Wed, Dec 26, 2012 at 1:00 AM, amlan barua <abarua at iit.edu> wrote:
> > Hi,
> > Greetings to the team! I am currently using PETSc for my research. Here
> is a brief description of my problem and my query
> > a) I have a set a points distributed on a 3 dimensional lattice.
> > b) Corresponding to each point in this set, 7 odes are defined.
> > c) Of these 7 odes, 6 are uncoupled but one is coupled to nearest
> neighbors.
> >
> > I suggest not optimizing for "missing" coupling to start with. We can do
> the optimization in the solver, perhaps by splitting the DMDA into the
> local and coupled parts.
>
>    I agree with Jed here. Coincidently I am working on a similar problem
> but with thousands of ODEs (mostly decoupled).  You can use
> DMDASetBlockFills(), the ofill parameter to indicate exactly what fields
> are coupled to neighbors and which are not, this reduces the unneeded zero
> Jacobian entries (you can also use the dfill parameter to reduce unneeded
> zero entries in the 7 by 7 block). Eventually we'll use the same
> information to reduce the ghost point communication also.
>
>    Barry
>
> >
> > d) To integrate the odes I am using PETSc's DMDA and TS. But my
> application needs implicit as well as locally high order solver. I am
> looking for an implicit RK4 type method. Does PETSc have an IRK4 support or
> equivalent?
> >
> > If you are happy with a diagonally implicit method, you can use
> TSARKIMEX (these integrators can be IMEX, but can also do any diagonally
> implicit method).
> >
> > If you want a fully implicit RK (like Gauss, Radau IIA, etc) then all
> stages are coupled together. Those methods are not currently implemented in
> PETSc, though you could implement it either as a new TS implementation
> (good for code reuse; you can do this outside of PETSc, but the code you
> write is like library code) or manually using SNES (not reusable).
> >
> > e) Suppose I want to build my own implicit time stepper. Should I
> imitate ex2.c of SNES solver?
> > Thanks
> > Amlan
> > IISER Pune, India
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121226/dad400da/attachment.html>

From bsmith at mcs.anl.gov  Wed Dec 26 12:27:50 2012
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Wed, 26 Dec 2012 12:27:50 -0600
Subject: [petsc-users] A quick question on DMDACreate3d
In-Reply-To: <50DB3C69.9060806@ku.edu>
References: <50DB3C69.9060806@ku.edu>
Message-ID: <3881B9C6-0989-4CE2-8A92-8BC297DC1E2F@mcs.anl.gov>


On Dec 26, 2012, at 12:05 PM, "Zhenglun (Alan) Wei" <z240w014 at ku.edu> wrote:

> Dear folks,
>      I have a quick question on the DMDACreate3d.
>      In the manual, it says that the input format of this function is:
> PetscErrorCode  DMDACreate3d(MPI_Comm comm,DMDABoundaryType bx,DMDABoundaryType by,DMDABoundaryType bz,DMDAStencilType stencil_type,PetscInt M,
>                PetscInt N,PetscInt P,PetscInt m,PetscInt n,PetscInt p,PetscInt dof,PetscInt s,const PetscInt lx[],const PetscInt ly[],const PetscInt lz[],DM *da)
> 
> 
>      Now, I'm trying to manually define the "arrays containing the number of nodes in each cell along the x, y, and z coordinates". Therefore, my focus turns to 'lx[]', 'ly[]' and 'lz[]'. I suppose that they're not simply just three integers; they may be three integer type arrays, as I guess. However, I checked all examples listed for this function. None of them teaches me how to implement this three parameters except 'PETSC_NULL'. Could you please provide me an extra example to demonstrate how to use DMDACreate3d or DMDACreate2d with non-null 'lx[]', 'ly[]' and 'lz[]'.
>      Or, a demonstration in 1D would be a good example. Say, I have a 1D uniform mesh; the number of grid in x-direction is 300. I want to use 4 processes to evenly divide this mesh. What should I input for 'lx[]' for each process?

   If you use lx of PETSC_NULL it will default to putting 75 points on each process.  Manually you would declare lx[4] and set lx[0] = lx[1] = lx[2] = lx[3] =75.   Note that all processes need to provide the exact same values in lx, ly and lz

> 
> thank you so much and Happy New Year!! :)
> Alan
>   
> 
> 


From jedbrown at mcs.anl.gov  Wed Dec 26 12:28:43 2012
From: jedbrown at mcs.anl.gov (Jed Brown)
Date: Wed, 26 Dec 2012 12:28:43 -0600
Subject: [petsc-users] A quick question on DMDACreate3d
In-Reply-To: <50DB3C69.9060806@ku.edu>
References: <50DB3C69.9060806@ku.edu>
Message-ID: <CAM9tzSn9DiYNBjLnBvBeO5Xh5X1EgzZot5aGq4twP_MwUbD9eQ@mail.gmail.com>

On Wed, Dec 26, 2012 at 12:05 PM, Zhenglun (Alan) Wei <z240w014 at ku.edu>wrote:

>  Dear folks,
>      I have a quick question on the DMDACreate3d.
>      In the manual, it says that the input format of this function is:
>
> PetscErrorCode  DMDACreate3d(MPI_Comm comm,DMDABoundaryType bx,DMDABoundaryType by,DMDABoundaryType bz,DMDAStencilType stencil_type,PetscInt M,
>                PetscInt N,PetscInt P,PetscInt m,PetscInt n,PetscInt p,PetscInt dof,PetscInt s,const PetscInt lx[],const PetscInt ly[],const PetscInt lz[],DM *da)
>
>
>      Now, I'm trying to manually define the "arrays containing the number
> of nodes in each cell along the x, y, and z coordinates". Therefore, my
> focus turns to 'lx[]', 'ly[]' and 'lz[]'. I suppose that they're not simply
> just three integers; they may be three integer type arrays, as I guess.
> However, I checked all examples listed for this function. None of them
> teaches me how to implement this three parameters except 'PETSC_NULL'.
> Could you please provide me an extra example to demonstrate how to use
> DMDACreate3d or DMDACreate2d with non-null 'lx[]', 'ly[]' and 'lz[]'.
>

It is used by snes/examples/tutorials/ex28.c in the 1D case to ensure that
the staggered grid has a specific compatible layout. As the docs say, these
are arrays of length m,n,p and must sum to M, N, and P.

     Or, a demonstration in 1D would be a good example. Say, I have a 1D
> uniform mesh; the number of grid in x-direction is 300. I want to use 4
> processes to evenly divide this mesh. What should I input for 'lx[]' for
> each process?
>
>
The defaults do this.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121226/186b0d66/attachment.html>

From s_g at berkeley.edu  Wed Dec 26 15:13:54 2012
From: s_g at berkeley.edu (Sanjay Govindjee)
Date: Wed, 26 Dec 2012 13:13:54 -0800
Subject: [petsc-users] Using superlu_dist in a direct solve
In-Reply-To: <CAMYG4GmW7FfEqvV6YEPygXFjpxjD2sDA9fkroAJ+DXXbjiW9nw@mail.gmail.com>
References: <50D7A664.6080802@berkeley.edu>
	<CAM9tzSnY9o6EckTmo_GkD6QxHjj3o54ragfJotbacPkr8fKfmw@mail.gmail.com>
	<50D7AB15.5040606@berkeley.edu>
	<CAMYG4GmuhF7Jh7os3j9bLubuoMGnerxtN+tU=gzpeYmX+3UUXA@mail.gmail.com>
	<50D7BFF3.3030909@berkeley.edu>
	<CAMYG4GmW7FfEqvV6YEPygXFjpxjD2sDA9fkroAJ+DXXbjiW9nw@mail.gmail.com>
Message-ID: <50DB6892.5040402@berkeley.edu>

I have done some more testing of the problem, continuing with 
src/ksp/ksp/examples/tutorials/ex2.c.

The behavior I am seeing is that with smaller problems sizes 
superlu_dist is behaving properly
but with larger problem sizes things seem to go wrong and what goes 
wrong is apparently consistent; the error appears both with my intel 
build as well as with my gcc build.

I have two run lines:

runex2superlu:
         -@${MPIEXEC} -n 2 ./ex2 -ksp_monitor_short -m 100 -n 100 
-ksp_type preonly -pc_type lu -pc_factor_mat_solver_package superlu_dist

runex2spooles:
         -@${MPIEXEC} -n 2 ./ex2 -ksp_monitor_short -m 100 -n 100 
-ksp_type preonly -pc_type lu -pc_factor_mat_solver_package spooles

 From my intel build, I get

sg-macbook-prolocal:tutorials sg$ make runex2superlu
Norm of error 7.66145e-13 iterations 1
sg-macbook-prolocal:tutorials sg$ make runex2spooles
Norm of error 2.21422e-12 iterations 1

 From my GCC build, I get
sg-macbook-prolocal:tutorials sg$ make runex2superlu
Norm of error 7.66145e-13 iterations 1
sg-macbook-prolocal:tutorials sg$ make runex2spooles
Norm of error 2.21422e-12 iterations 1

If I change the -m 100 -n 100 to -m 500 -n 500, I get for my intel build

sg-macbook-prolocal:tutorials sg$ make runex2superlu
Norm of error 419.953 iterations 1
sg-macbook-prolocal:tutorials sg$ make runex2spooles
Norm of error 2.69468e-10 iterations 1

 From my GCC build with -m 500 -n 500, I get

sg-macbook-prolocal:tutorials sg$ make runex2superlu
Norm of error 419.953 iterations 1
sg-macbook-prolocal:tutorials sg$ make runex2spooles
Norm of error 2.69468e-10 iterations 1


Any suggestions will be greatly appreciated.

-sanjay






On 12/23/12 6:42 PM, Matthew Knepley wrote:
>
> On Sun, Dec 23, 2012 at 9:37 PM, Sanjay Govindjee <s_g at berkeley.edu 
> <mailto:s_g at berkeley.edu>> wrote:
>
>     I decided to go with ksp/ksp/exampeles/tutorials/ex2.c; I was
>     unsure how to convert the run lines for snes/examples/ex5.c to
>     work with a direct solver as I am not versed in SNES options.
>
>     Notwithstanding something strange is happening only on select
>     examples.  With ksp/ksp/exampeles/tutorials/ex2.c and the run line:
>
>     -@${MPIEXEC} -n 2 ./ex2 -ksp_monitor_short -m 20 -n 20 -ksp_type
>     preonly -pc_type lu -pc_factor_mat_solver_package superlu_dist
>
>     I get good results (of the order):
>
>     Norm of error 1.85464e-14 iterations 1
>
>     using both superlu_dist and spooles.
>
>     My BLAS/LAPACK: -llapack -lblas (so native to my machine).
>
>     If you can guide me on a run line for the snes ex5.c I can try
>     that too.  I'll also try to construct a GCC build later to see if
>     that is an issue.
>
>
> Same line on ex5, but ex2 is good enough. However, it will not tell us 
> anything new. Try another build.
>
>    Matt
>
>     -sanjay
>
>
>     On 12/23/12 5:58 PM, Matthew Knepley wrote:
>>     On Sun, Dec 23, 2012 at 8:08 PM, Sanjay Govindjee
>>     <s_g at berkeley.edu <mailto:s_g at berkeley.edu>> wrote:
>>
>>         Not sure what you mean by where is your matrix? I am simply
>>         running ex6 in the ksp/examples/tests directory.
>>
>>         The reason I ran this test is because I was seeing the same
>>         behavior with my finite element code (on perfectly benign
>>         problems).
>>
>>         Is there a built-in test that you use to check that
>>         superlu_dist is working properly with petsc?
>>         i.e. something you know that works with with petsc 3.3-p5?
>>
>>
>>     1) Run it on a SNES ex5 (or KSP ex2), which is a nice Laplacian
>>
>>     2) Compare with MUMPS
>>
>>        Matt
>>
>>         -sanjay
>>
>>
>>
>>         On 12/23/12 4:56 PM, Jed Brown wrote:
>>>         Where is your matrix? It might be ending up with a very bad
>>>         pivot. If the problem can be reproduced, it should be
>>>         reported to the SuperLU_DIST developers to fix. (Note that
>>>         we do not see this with other matrices.) You can also try MUMPS.
>>>
>>>
>>>         On Sun, Dec 23, 2012 at 6:48 PM, Sanjay Govindjee
>>>         <s_g at berkeley.edu <mailto:s_g at berkeley.edu>> wrote:
>>>
>>>             I wanted to use SuperLU Dist to perform a direct solve
>>>             but seem to be encountering
>>>             a problem.  I was wonder if this is a know issue and if
>>>             there is a solution for it.
>>>
>>>             The problem is easily observed using ex6.c in
>>>             src/ksp/ksp/examples/tests.
>>>
>>>             Out of the box: make runex6 produces a residual error of
>>>             O(1e-11), all is well.
>>>
>>>             I then changed the run to run on two processors and add
>>>             the flag
>>>             -pc_factor_mat_solver_package spooles  this produces a
>>>             residual error of O(1e-11), all is still well.
>>>
>>>             I then switch over to -pc_factor_mat_solver_package
>>>             superlu_dist and the
>>>             residual error comes back as 22.6637!  Something seems
>>>             very wrong.
>>>
>>>             My build is perfectly vanilla:
>>>
>>>             export PETSC_DIR=/Users/sg/petsc-3.3-p5/
>>>             export PETSC_ARCH=intel
>>>
>>>             ./configure --with-cc=icc --with-fc=ifort  \
>>>             -download-{spooles,parmetis,superlu_dist,prometheus,mpich,ml,hypre,metis}
>>>
>>>             make PETSC_DIR=/Users/sg/petsc-3.3-p5/ PETSC_ARCH=intel all
>>>             make PETSC_DIR=/Users/sg/petsc-3.3-p5/ PETSC_ARCH=intel test
>>>
>>>             -sanjay
>>>
>>>
>>
>>         -- 
>>         -----------------------------------------------
>>         Sanjay Govindjee, PhD, PE
>>         Professor of Civil Engineering
>>         Vice Chair for Academic Affairs
>>
>>         779 Davis Hall
>>         Structural Engineering, Mechanics and Materials
>>         Department of Civil Engineering
>>         University of California
>>         Berkeley, CA 94720-1710
>>
>>         Voice:+1 510 642 6060  <tel:%2B1%20510%20642%206060>
>>         FAX:+1 510 643 5264  <tel:%2B1%20510%20643%205264>
>>         s_g at berkeley.edu  <mailto:s_g at berkeley.edu>
>>         http://www.ce.berkeley.edu/~sanjay  <http://www.ce.berkeley.edu/%7Esanjay>
>>         -----------------------------------------------
>>
>>         New Books:
>>
>>         Engineering Mechanics of Deformable
>>         Solids: A Presentation with Exercises
>>         http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641
>>         http://ukcatalogue.oup.com/product/9780199651641.do
>>         http://amzn.com/0199651647
>>
>>
>>         Engineering Mechanics 3 (Dynamics)
>>         http://www.springer.com/materials/mechanics/book/978-3-642-14018-1
>>         http://amzn.com/3642140181
>>
>>         -----------------------------------------------
>>
>>
>>
>>
>>     -- 
>>     What most experimenters take for granted before they begin their
>>     experiments is infinitely more interesting than any results to
>>     which their experiments lead.
>>     -- Norbert Wiener
>
>
>
>
> -- 
> What most experimenters take for granted before they begin their 
> experiments is infinitely more interesting than any results to which 
> their experiments lead.
> -- Norbert Wiener

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121226/d1af9014/attachment-0001.html>

From hzhang at mcs.anl.gov  Wed Dec 26 15:23:33 2012
From: hzhang at mcs.anl.gov (Hong Zhang)
Date: Wed, 26 Dec 2012 15:23:33 -0600
Subject: [petsc-users] Using superlu_dist in a direct solve
In-Reply-To: <50DB6892.5040402@berkeley.edu>
References: <50D7A664.6080802@berkeley.edu>
	<CAM9tzSnY9o6EckTmo_GkD6QxHjj3o54ragfJotbacPkr8fKfmw@mail.gmail.com>
	<50D7AB15.5040606@berkeley.edu>
	<CAMYG4GmuhF7Jh7os3j9bLubuoMGnerxtN+tU=gzpeYmX+3UUXA@mail.gmail.com>
	<50D7BFF3.3030909@berkeley.edu>
	<CAMYG4GmW7FfEqvV6YEPygXFjpxjD2sDA9fkroAJ+DXXbjiW9nw@mail.gmail.com>
	<50DB6892.5040402@berkeley.edu>
Message-ID: <CAGCphBubYXDONH97Pf+FGvXzHEyyJqZaZpx0JbKoBL5_ypZiWw@mail.gmail.com>

Sanjay:
I get
petsc-3.3/src/ksp/ksp/examples/tutorials>mpiexec -n 2 ./ex2
-ksp_monitor_short  -ksp_type preonly -pc_type lu
-pc_factor_mat_solver_package superlu_dist -m 500 -n 500
Norm of error 1.92279e-11 iterations 1

Hong

> I have done some more testing of the problem, continuing with
> src/ksp/ksp/examples/tutorials/ex2.c.
>
> The behavior I am seeing is that with smaller problems sizes superlu_dist is
> behaving properly
> but with larger problem sizes things seem to go wrong and what goes wrong is
> apparently consistent; the error appears both with my intel build as well as
> with my gcc build.
>
> I have two run lines:
>
> runex2superlu:
>         -@${MPIEXEC} -n 2 ./ex2 -ksp_monitor_short -m 100 -n 100 -ksp_type
> preonly -pc_type lu -pc_factor_mat_solver_package superlu_dist
>
> runex2spooles:
>         -@${MPIEXEC} -n 2 ./ex2 -ksp_monitor_short -m 100 -n 100 -ksp_type
> preonly -pc_type lu -pc_factor_mat_solver_package spooles
>
> From my intel build, I get
>
> sg-macbook-prolocal:tutorials sg$ make runex2superlu
> Norm of error 7.66145e-13 iterations 1
> sg-macbook-prolocal:tutorials sg$ make runex2spooles
> Norm of error 2.21422e-12 iterations 1
>
> From my GCC build, I get
> sg-macbook-prolocal:tutorials sg$ make runex2superlu
> Norm of error 7.66145e-13 iterations 1
> sg-macbook-prolocal:tutorials sg$ make runex2spooles
> Norm of error 2.21422e-12 iterations 1
>
> If I change the -m 100 -n 100 to -m 500 -n 500, I get for my intel build
>
> sg-macbook-prolocal:tutorials sg$ make runex2superlu
> Norm of error 419.953 iterations 1
> sg-macbook-prolocal:tutorials sg$ make runex2spooles
> Norm of error 2.69468e-10 iterations 1
>
> From my GCC build with -m 500 -n 500, I get
>
> sg-macbook-prolocal:tutorials sg$ make runex2superlu
> Norm of error 419.953 iterations 1
> sg-macbook-prolocal:tutorials sg$ make runex2spooles
> Norm of error 2.69468e-10 iterations 1
>
>
> Any suggestions will be greatly appreciated.
>
> -sanjay
>
>
>
>
>
>
>
> On 12/23/12 6:42 PM, Matthew Knepley wrote:
>
>
> On Sun, Dec 23, 2012 at 9:37 PM, Sanjay Govindjee <s_g at berkeley.edu> wrote:
>>
>> I decided to go with ksp/ksp/exampeles/tutorials/ex2.c; I was unsure how
>> to convert the run lines for snes/examples/ex5.c to work with a direct
>> solver as I am not versed in SNES options.
>>
>> Notwithstanding something strange is happening only on select examples.
>> With ksp/ksp/exampeles/tutorials/ex2.c and the run line:
>>
>> -@${MPIEXEC} -n 2 ./ex2 -ksp_monitor_short -m 20 -n 20 -ksp_type preonly
>> -pc_type lu -pc_factor_mat_solver_package superlu_dist
>>
>> I get good results (of the order):
>>
>> Norm of error 1.85464e-14 iterations 1
>>
>> using both superlu_dist and spooles.
>>
>> My BLAS/LAPACK: -llapack -lblas (so native to my machine).
>>
>> If you can guide me on a run line for the snes ex5.c I can try that too.
>> I'll also try to construct a GCC build later to see if that is an issue.
>
>
> Same line on ex5, but ex2 is good enough. However, it will not tell us
> anything new. Try another build.
>
>    Matt
>
>>
>> -sanjay
>>
>>
>> On 12/23/12 5:58 PM, Matthew Knepley wrote:
>>
>> On Sun, Dec 23, 2012 at 8:08 PM, Sanjay Govindjee <s_g at berkeley.edu>
>> wrote:
>>>
>>> Not sure what you mean by where is your matrix?  I am simply running ex6
>>> in the ksp/examples/tests directory.
>>>
>>> The reason I ran this test is because I was seeing the same behavior with
>>> my finite element code (on perfectly benign problems).
>>>
>>> Is there a built-in test that you use to check that superlu_dist is
>>> working properly with petsc?
>>> i.e. something you know that works with with petsc 3.3-p5?
>>
>>
>> 1) Run it on a SNES ex5 (or KSP ex2), which is a nice Laplacian
>>
>> 2) Compare with MUMPS
>>
>>    Matt
>>
>>>
>>> -sanjay
>>>
>>>
>>>
>>> On 12/23/12 4:56 PM, Jed Brown wrote:
>>>
>>> Where is your matrix? It might be ending up with a very bad pivot. If the
>>> problem can be reproduced, it should be reported to the SuperLU_DIST
>>> developers to fix. (Note that we do not see this with other matrices.) You
>>> can also try MUMPS.
>>>
>>>
>>> On Sun, Dec 23, 2012 at 6:48 PM, Sanjay Govindjee <s_g at berkeley.edu>
>>> wrote:
>>>>
>>>> I wanted to use SuperLU Dist to perform a direct solve but seem to be
>>>> encountering
>>>> a problem.  I was wonder if this is a know issue and if there is a
>>>> solution for it.
>>>>
>>>> The problem is easily observed using ex6.c in
>>>> src/ksp/ksp/examples/tests.
>>>>
>>>> Out of the box: make runex6 produces a residual error of O(1e-11), all
>>>> is well.
>>>>
>>>> I then changed the run to run on two processors and add the flag
>>>> -pc_factor_mat_solver_package spooles  this produces a residual error of
>>>> O(1e-11), all is still well.
>>>>
>>>> I then switch over to -pc_factor_mat_solver_package superlu_dist and the
>>>> residual error comes back as 22.6637!  Something seems very wrong.
>>>>
>>>> My build is perfectly vanilla:
>>>>
>>>> export PETSC_DIR=/Users/sg/petsc-3.3-p5/
>>>> export PETSC_ARCH=intel
>>>>
>>>> ./configure --with-cc=icc --with-fc=ifort  \
>>>> -download-{spooles,parmetis,superlu_dist,prometheus,mpich,ml,hypre,metis}
>>>>
>>>> make PETSC_DIR=/Users/sg/petsc-3.3-p5/ PETSC_ARCH=intel all
>>>> make PETSC_DIR=/Users/sg/petsc-3.3-p5/ PETSC_ARCH=intel test
>>>>
>>>> -sanjay
>>>
>>>
>>>
>>> --
>>> -----------------------------------------------
>>> Sanjay Govindjee, PhD, PE
>>> Professor of Civil Engineering
>>> Vice Chair for Academic Affairs
>>>
>>> 779 Davis Hall
>>> Structural Engineering, Mechanics and Materials
>>> Department of Civil Engineering
>>> University of California
>>> Berkeley, CA 94720-1710
>>>
>>> Voice:  +1 510 642 6060
>>> FAX:    +1 510 643 5264
>>> s_g at berkeley.edu
>>> http://www.ce.berkeley.edu/~sanjay
>>> -----------------------------------------------
>>>
>>> New Books:
>>>
>>> Engineering Mechanics of Deformable
>>> Solids: A Presentation with Exercises
>>>
>>> http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641
>>> http://ukcatalogue.oup.com/product/9780199651641.do
>>> http://amzn.com/0199651647
>>>
>>>
>>> Engineering Mechanics 3 (Dynamics)
>>> http://www.springer.com/materials/mechanics/book/978-3-642-14018-1
>>> http://amzn.com/3642140181
>>>
>>> -----------------------------------------------
>>
>>
>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>>
>
>
>
> --
> What most experimenters take for granted before they begin their experiments
> is infinitely more interesting than any results to which their experiments
> lead.
> -- Norbert Wiener
>
>

From s_g at berkeley.edu  Wed Dec 26 15:28:37 2012
From: s_g at berkeley.edu (Sanjay Govindjee)
Date: Wed, 26 Dec 2012 13:28:37 -0800
Subject: [petsc-users] Using superlu_dist in a direct solve
In-Reply-To: <CAGCphBubYXDONH97Pf+FGvXzHEyyJqZaZpx0JbKoBL5_ypZiWw@mail.gmail.com>
References: <50D7A664.6080802@berkeley.edu>
	<CAM9tzSnY9o6EckTmo_GkD6QxHjj3o54ragfJotbacPkr8fKfmw@mail.gmail.com>
	<50D7AB15.5040606@berkeley.edu>
	<CAMYG4GmuhF7Jh7os3j9bLubuoMGnerxtN+tU=gzpeYmX+3UUXA@mail.gmail.com>
	<50D7BFF3.3030909@berkeley.edu>
	<CAMYG4GmW7FfEqvV6YEPygXFjpxjD2sDA9fkroAJ+DXXbjiW9nw@mail.gmail.com>
	<50DB6892.5040402@berkeley.edu>
	<CAGCphBubYXDONH97Pf+FGvXzHEyyJqZaZpx0JbKoBL5_ypZiWw@mail.gmail.com>
Message-ID: <50DB6C05.4090006@berkeley.edu>

hmmm....I guess that is good news -- in that superlu is not broken. 
However, for me
not so good news since I seems that there is nasty bug lurking on my 
machine.

Any suggestions on chasing down the error?

On 12/26/12 1:23 PM, Hong Zhang wrote:
> Sanjay:
> I get
> petsc-3.3/src/ksp/ksp/examples/tutorials>mpiexec -n 2 ./ex2
> -ksp_monitor_short  -ksp_type preonly -pc_type lu
> -pc_factor_mat_solver_package superlu_dist -m 500 -n 500
> Norm of error 1.92279e-11 iterations 1
>
> Hong
>
>> I have done some more testing of the problem, continuing with
>> src/ksp/ksp/examples/tutorials/ex2.c.
>>
>> The behavior I am seeing is that with smaller problems sizes superlu_dist is
>> behaving properly
>> but with larger problem sizes things seem to go wrong and what goes wrong is
>> apparently consistent; the error appears both with my intel build as well as
>> with my gcc build.
>>
>> I have two run lines:
>>
>> runex2superlu:
>>          -@${MPIEXEC} -n 2 ./ex2 -ksp_monitor_short -m 100 -n 100 -ksp_type
>> preonly -pc_type lu -pc_factor_mat_solver_package superlu_dist
>>
>> runex2spooles:
>>          -@${MPIEXEC} -n 2 ./ex2 -ksp_monitor_short -m 100 -n 100 -ksp_type
>> preonly -pc_type lu -pc_factor_mat_solver_package spooles
>>
>>  From my intel build, I get
>>
>> sg-macbook-prolocal:tutorials sg$ make runex2superlu
>> Norm of error 7.66145e-13 iterations 1
>> sg-macbook-prolocal:tutorials sg$ make runex2spooles
>> Norm of error 2.21422e-12 iterations 1
>>
>>  From my GCC build, I get
>> sg-macbook-prolocal:tutorials sg$ make runex2superlu
>> Norm of error 7.66145e-13 iterations 1
>> sg-macbook-prolocal:tutorials sg$ make runex2spooles
>> Norm of error 2.21422e-12 iterations 1
>>
>> If I change the -m 100 -n 100 to -m 500 -n 500, I get for my intel build
>>
>> sg-macbook-prolocal:tutorials sg$ make runex2superlu
>> Norm of error 419.953 iterations 1
>> sg-macbook-prolocal:tutorials sg$ make runex2spooles
>> Norm of error 2.69468e-10 iterations 1
>>
>>  From my GCC build with -m 500 -n 500, I get
>>
>> sg-macbook-prolocal:tutorials sg$ make runex2superlu
>> Norm of error 419.953 iterations 1
>> sg-macbook-prolocal:tutorials sg$ make runex2spooles
>> Norm of error 2.69468e-10 iterations 1
>>
>>
>> Any suggestions will be greatly appreciated.
>>
>> -sanjay
>>
>>
>>
>>
>>
>>
>>
>> On 12/23/12 6:42 PM, Matthew Knepley wrote:
>>
>>
>> On Sun, Dec 23, 2012 at 9:37 PM, Sanjay Govindjee <s_g at berkeley.edu> wrote:
>>> I decided to go with ksp/ksp/exampeles/tutorials/ex2.c; I was unsure how
>>> to convert the run lines for snes/examples/ex5.c to work with a direct
>>> solver as I am not versed in SNES options.
>>>
>>> Notwithstanding something strange is happening only on select examples.
>>> With ksp/ksp/exampeles/tutorials/ex2.c and the run line:
>>>
>>> -@${MPIEXEC} -n 2 ./ex2 -ksp_monitor_short -m 20 -n 20 -ksp_type preonly
>>> -pc_type lu -pc_factor_mat_solver_package superlu_dist
>>>
>>> I get good results (of the order):
>>>
>>> Norm of error 1.85464e-14 iterations 1
>>>
>>> using both superlu_dist and spooles.
>>>
>>> My BLAS/LAPACK: -llapack -lblas (so native to my machine).
>>>
>>> If you can guide me on a run line for the snes ex5.c I can try that too.
>>> I'll also try to construct a GCC build later to see if that is an issue.
>>
>> Same line on ex5, but ex2 is good enough. However, it will not tell us
>> anything new. Try another build.
>>
>>     Matt
>>
>>> -sanjay
>>>
>>>
>>> On 12/23/12 5:58 PM, Matthew Knepley wrote:
>>>
>>> On Sun, Dec 23, 2012 at 8:08 PM, Sanjay Govindjee <s_g at berkeley.edu>
>>> wrote:
>>>> Not sure what you mean by where is your matrix?  I am simply running ex6
>>>> in the ksp/examples/tests directory.
>>>>
>>>> The reason I ran this test is because I was seeing the same behavior with
>>>> my finite element code (on perfectly benign problems).
>>>>
>>>> Is there a built-in test that you use to check that superlu_dist is
>>>> working properly with petsc?
>>>> i.e. something you know that works with with petsc 3.3-p5?
>>>
>>> 1) Run it on a SNES ex5 (or KSP ex2), which is a nice Laplacian
>>>
>>> 2) Compare with MUMPS
>>>
>>>     Matt
>>>
>>>> -sanjay
>>>>
>>>>
>>>>
>>>> On 12/23/12 4:56 PM, Jed Brown wrote:
>>>>
>>>> Where is your matrix? It might be ending up with a very bad pivot. If the
>>>> problem can be reproduced, it should be reported to the SuperLU_DIST
>>>> developers to fix. (Note that we do not see this with other matrices.) You
>>>> can also try MUMPS.
>>>>
>>>>
>>>> On Sun, Dec 23, 2012 at 6:48 PM, Sanjay Govindjee <s_g at berkeley.edu>
>>>> wrote:
>>>>> I wanted to use SuperLU Dist to perform a direct solve but seem to be
>>>>> encountering
>>>>> a problem.  I was wonder if this is a know issue and if there is a
>>>>> solution for it.
>>>>>
>>>>> The problem is easily observed using ex6.c in
>>>>> src/ksp/ksp/examples/tests.
>>>>>
>>>>> Out of the box: make runex6 produces a residual error of O(1e-11), all
>>>>> is well.
>>>>>
>>>>> I then changed the run to run on two processors and add the flag
>>>>> -pc_factor_mat_solver_package spooles  this produces a residual error of
>>>>> O(1e-11), all is still well.
>>>>>
>>>>> I then switch over to -pc_factor_mat_solver_package superlu_dist and the
>>>>> residual error comes back as 22.6637!  Something seems very wrong.
>>>>>
>>>>> My build is perfectly vanilla:
>>>>>
>>>>> export PETSC_DIR=/Users/sg/petsc-3.3-p5/
>>>>> export PETSC_ARCH=intel
>>>>>
>>>>> ./configure --with-cc=icc --with-fc=ifort  \
>>>>> -download-{spooles,parmetis,superlu_dist,prometheus,mpich,ml,hypre,metis}
>>>>>
>>>>> make PETSC_DIR=/Users/sg/petsc-3.3-p5/ PETSC_ARCH=intel all
>>>>> make PETSC_DIR=/Users/sg/petsc-3.3-p5/ PETSC_ARCH=intel test
>>>>>
>>>>> -sanjay
>>>>
>>>>
>>>> --
>>>> -----------------------------------------------
>>>> Sanjay Govindjee, PhD, PE
>>>> Professor of Civil Engineering
>>>> Vice Chair for Academic Affairs
>>>>
>>>> 779 Davis Hall
>>>> Structural Engineering, Mechanics and Materials
>>>> Department of Civil Engineering
>>>> University of California
>>>> Berkeley, CA 94720-1710
>>>>
>>>> Voice:  +1 510 642 6060
>>>> FAX:    +1 510 643 5264
>>>> s_g at berkeley.edu
>>>> http://www.ce.berkeley.edu/~sanjay
>>>> -----------------------------------------------
>>>>
>>>> New Books:
>>>>
>>>> Engineering Mechanics of Deformable
>>>> Solids: A Presentation with Exercises
>>>>
>>>> http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641
>>>> http://ukcatalogue.oup.com/product/9780199651641.do
>>>> http://amzn.com/0199651647
>>>>
>>>>
>>>> Engineering Mechanics 3 (Dynamics)
>>>> http://www.springer.com/materials/mechanics/book/978-3-642-14018-1
>>>> http://amzn.com/3642140181
>>>>
>>>> -----------------------------------------------
>>>
>>>
>>>
>>> --
>>> What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which their
>>> experiments lead.
>>> -- Norbert Wiener
>>>
>>>
>>
>>
>> --
>> What most experimenters take for granted before they begin their experiments
>> is infinitely more interesting than any results to which their experiments
>> lead.
>> -- Norbert Wiener
>>
>>

-- 
-----------------------------------------------
Sanjay Govindjee, PhD, PE
Professor of Civil Engineering
Vice Chair for Academic Affairs

779 Davis Hall
Structural Engineering, Mechanics and Materials
Department of Civil Engineering
University of California
Berkeley, CA 94720-1710

Voice:  +1 510 642 6060
FAX:    +1 510 643 5264
s_g at berkeley.edu
http://www.ce.berkeley.edu/~sanjay
-----------------------------------------------

New Books:

Engineering Mechanics of Deformable
Solids: A Presentation with Exercises
http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641
http://ukcatalogue.oup.com/product/9780199651641.do
http://amzn.com/0199651647


Engineering Mechanics 3 (Dynamics)
http://www.springer.com/materials/mechanics/book/978-3-642-14018-1
http://amzn.com/3642140181

-----------------------------------------------


From hzhang at mcs.anl.gov  Wed Dec 26 15:34:38 2012
From: hzhang at mcs.anl.gov (Hong Zhang)
Date: Wed, 26 Dec 2012 15:34:38 -0600
Subject: [petsc-users] Using superlu_dist in a direct solve
In-Reply-To: <50DB6C05.4090006@berkeley.edu>
References: <50D7A664.6080802@berkeley.edu>
	<CAM9tzSnY9o6EckTmo_GkD6QxHjj3o54ragfJotbacPkr8fKfmw@mail.gmail.com>
	<50D7AB15.5040606@berkeley.edu>
	<CAMYG4GmuhF7Jh7os3j9bLubuoMGnerxtN+tU=gzpeYmX+3UUXA@mail.gmail.com>
	<50D7BFF3.3030909@berkeley.edu>
	<CAMYG4GmW7FfEqvV6YEPygXFjpxjD2sDA9fkroAJ+DXXbjiW9nw@mail.gmail.com>
	<50DB6892.5040402@berkeley.edu>
	<CAGCphBubYXDONH97Pf+FGvXzHEyyJqZaZpx0JbKoBL5_ypZiWw@mail.gmail.com>
	<50DB6C05.4090006@berkeley.edu>
Message-ID: <CAGCphBtf2Q3d2VS_DYdxvTY5iaFqJFZAKkkHw2OmhrYyP7=hdw@mail.gmail.com>

Sanjay:
> hmmm....I guess that is good news -- in that superlu is not broken. However,
> for me
> not so good news since I seems that there is nasty bug lurking on my
> machine.
>
> Any suggestions on chasing down the error?

How did you install your supelu_dist with petsc-3.3?
What machine do you use?

Hong
>
>
> On 12/26/12 1:23 PM, Hong Zhang wrote:
>>
>> Sanjay:
>> I get
>> petsc-3.3/src/ksp/ksp/examples/tutorials>mpiexec -n 2 ./ex2
>> -ksp_monitor_short  -ksp_type preonly -pc_type lu
>> -pc_factor_mat_solver_package superlu_dist -m 500 -n 500
>> Norm of error 1.92279e-11 iterations 1
>>
>> Hong
>>
>>> I have done some more testing of the problem, continuing with
>>> src/ksp/ksp/examples/tutorials/ex2.c.
>>>
>>> The behavior I am seeing is that with smaller problems sizes superlu_dist
>>> is
>>> behaving properly
>>> but with larger problem sizes things seem to go wrong and what goes wrong
>>> is
>>> apparently consistent; the error appears both with my intel build as well
>>> as
>>> with my gcc build.
>>>
>>> I have two run lines:
>>>
>>> runex2superlu:
>>>          -@${MPIEXEC} -n 2 ./ex2 -ksp_monitor_short -m 100 -n 100
>>> -ksp_type
>>> preonly -pc_type lu -pc_factor_mat_solver_package superlu_dist
>>>
>>> runex2spooles:
>>>          -@${MPIEXEC} -n 2 ./ex2 -ksp_monitor_short -m 100 -n 100
>>> -ksp_type
>>> preonly -pc_type lu -pc_factor_mat_solver_package spooles
>>>
>>>  From my intel build, I get
>>>
>>> sg-macbook-prolocal:tutorials sg$ make runex2superlu
>>> Norm of error 7.66145e-13 iterations 1
>>> sg-macbook-prolocal:tutorials sg$ make runex2spooles
>>> Norm of error 2.21422e-12 iterations 1
>>>
>>>  From my GCC build, I get
>>> sg-macbook-prolocal:tutorials sg$ make runex2superlu
>>> Norm of error 7.66145e-13 iterations 1
>>> sg-macbook-prolocal:tutorials sg$ make runex2spooles
>>> Norm of error 2.21422e-12 iterations 1
>>>
>>> If I change the -m 100 -n 100 to -m 500 -n 500, I get for my intel build
>>>
>>> sg-macbook-prolocal:tutorials sg$ make runex2superlu
>>> Norm of error 419.953 iterations 1
>>> sg-macbook-prolocal:tutorials sg$ make runex2spooles
>>> Norm of error 2.69468e-10 iterations 1
>>>
>>>  From my GCC build with -m 500 -n 500, I get
>>>
>>> sg-macbook-prolocal:tutorials sg$ make runex2superlu
>>> Norm of error 419.953 iterations 1
>>> sg-macbook-prolocal:tutorials sg$ make runex2spooles
>>> Norm of error 2.69468e-10 iterations 1
>>>
>>>
>>> Any suggestions will be greatly appreciated.
>>>
>>> -sanjay
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 12/23/12 6:42 PM, Matthew Knepley wrote:
>>>
>>>
>>> On Sun, Dec 23, 2012 at 9:37 PM, Sanjay Govindjee <s_g at berkeley.edu>
>>> wrote:
>>>>
>>>> I decided to go with ksp/ksp/exampeles/tutorials/ex2.c; I was unsure how
>>>> to convert the run lines for snes/examples/ex5.c to work with a direct
>>>> solver as I am not versed in SNES options.
>>>>
>>>> Notwithstanding something strange is happening only on select examples.
>>>> With ksp/ksp/exampeles/tutorials/ex2.c and the run line:
>>>>
>>>> -@${MPIEXEC} -n 2 ./ex2 -ksp_monitor_short -m 20 -n 20 -ksp_type preonly
>>>> -pc_type lu -pc_factor_mat_solver_package superlu_dist
>>>>
>>>> I get good results (of the order):
>>>>
>>>> Norm of error 1.85464e-14 iterations 1
>>>>
>>>> using both superlu_dist and spooles.
>>>>
>>>> My BLAS/LAPACK: -llapack -lblas (so native to my machine).
>>>>
>>>> If you can guide me on a run line for the snes ex5.c I can try that too.
>>>> I'll also try to construct a GCC build later to see if that is an issue.
>>>
>>>
>>> Same line on ex5, but ex2 is good enough. However, it will not tell us
>>> anything new. Try another build.
>>>
>>>     Matt
>>>
>>>> -sanjay
>>>>
>>>>
>>>> On 12/23/12 5:58 PM, Matthew Knepley wrote:
>>>>
>>>> On Sun, Dec 23, 2012 at 8:08 PM, Sanjay Govindjee <s_g at berkeley.edu>
>>>> wrote:
>>>>>
>>>>> Not sure what you mean by where is your matrix?  I am simply running
>>>>> ex6
>>>>> in the ksp/examples/tests directory.
>>>>>
>>>>> The reason I ran this test is because I was seeing the same behavior
>>>>> with
>>>>> my finite element code (on perfectly benign problems).
>>>>>
>>>>> Is there a built-in test that you use to check that superlu_dist is
>>>>> working properly with petsc?
>>>>> i.e. something you know that works with with petsc 3.3-p5?
>>>>
>>>>
>>>> 1) Run it on a SNES ex5 (or KSP ex2), which is a nice Laplacian
>>>>
>>>> 2) Compare with MUMPS
>>>>
>>>>     Matt
>>>>
>>>>> -sanjay
>>>>>
>>>>>
>>>>>
>>>>> On 12/23/12 4:56 PM, Jed Brown wrote:
>>>>>
>>>>> Where is your matrix? It might be ending up with a very bad pivot. If
>>>>> the
>>>>> problem can be reproduced, it should be reported to the SuperLU_DIST
>>>>> developers to fix. (Note that we do not see this with other matrices.)
>>>>> You
>>>>> can also try MUMPS.
>>>>>
>>>>>
>>>>> On Sun, Dec 23, 2012 at 6:48 PM, Sanjay Govindjee <s_g at berkeley.edu>
>>>>> wrote:
>>>>>>
>>>>>> I wanted to use SuperLU Dist to perform a direct solve but seem to be
>>>>>> encountering
>>>>>> a problem.  I was wonder if this is a know issue and if there is a
>>>>>> solution for it.
>>>>>>
>>>>>> The problem is easily observed using ex6.c in
>>>>>> src/ksp/ksp/examples/tests.
>>>>>>
>>>>>> Out of the box: make runex6 produces a residual error of O(1e-11), all
>>>>>> is well.
>>>>>>
>>>>>> I then changed the run to run on two processors and add the flag
>>>>>> -pc_factor_mat_solver_package spooles  this produces a residual error
>>>>>> of
>>>>>> O(1e-11), all is still well.
>>>>>>
>>>>>> I then switch over to -pc_factor_mat_solver_package superlu_dist and
>>>>>> the
>>>>>> residual error comes back as 22.6637!  Something seems very wrong.
>>>>>>
>>>>>> My build is perfectly vanilla:
>>>>>>
>>>>>> export PETSC_DIR=/Users/sg/petsc-3.3-p5/
>>>>>> export PETSC_ARCH=intel
>>>>>>
>>>>>> ./configure --with-cc=icc --with-fc=ifort  \
>>>>>>
>>>>>> -download-{spooles,parmetis,superlu_dist,prometheus,mpich,ml,hypre,metis}
>>>>>>
>>>>>> make PETSC_DIR=/Users/sg/petsc-3.3-p5/ PETSC_ARCH=intel all
>>>>>> make PETSC_DIR=/Users/sg/petsc-3.3-p5/ PETSC_ARCH=intel test
>>>>>>
>>>>>> -sanjay
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> -----------------------------------------------
>>>>> Sanjay Govindjee, PhD, PE
>>>>> Professor of Civil Engineering
>>>>> Vice Chair for Academic Affairs
>>>>>
>>>>> 779 Davis Hall
>>>>> Structural Engineering, Mechanics and Materials
>>>>> Department of Civil Engineering
>>>>> University of California
>>>>> Berkeley, CA 94720-1710
>>>>>
>>>>> Voice:  +1 510 642 6060
>>>>> FAX:    +1 510 643 5264
>>>>> s_g at berkeley.edu
>>>>> http://www.ce.berkeley.edu/~sanjay
>>>>> -----------------------------------------------
>>>>>
>>>>> New Books:
>>>>>
>>>>> Engineering Mechanics of Deformable
>>>>> Solids: A Presentation with Exercises
>>>>>
>>>>>
>>>>> http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641
>>>>> http://ukcatalogue.oup.com/product/9780199651641.do
>>>>> http://amzn.com/0199651647
>>>>>
>>>>>
>>>>> Engineering Mechanics 3 (Dynamics)
>>>>> http://www.springer.com/materials/mechanics/book/978-3-642-14018-1
>>>>> http://amzn.com/3642140181
>>>>>
>>>>> -----------------------------------------------
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> What most experimenters take for granted before they begin their
>>>> experiments is infinitely more interesting than any results to which
>>>> their
>>>> experiments lead.
>>>> -- Norbert Wiener
>>>>
>>>>
>>>
>>>
>>> --
>>> What most experimenters take for granted before they begin their
>>> experiments
>>> is infinitely more interesting than any results to which their
>>> experiments
>>> lead.
>>> -- Norbert Wiener
>>>
>>>
>
> --
> -----------------------------------------------
> Sanjay Govindjee, PhD, PE
> Professor of Civil Engineering
> Vice Chair for Academic Affairs
>
> 779 Davis Hall
> Structural Engineering, Mechanics and Materials
> Department of Civil Engineering
> University of California
> Berkeley, CA 94720-1710
>
> Voice:  +1 510 642 6060
> FAX:    +1 510 643 5264
> s_g at berkeley.edu
> http://www.ce.berkeley.edu/~sanjay
> -----------------------------------------------
>
> New Books:
>
> Engineering Mechanics of Deformable
> Solids: A Presentation with Exercises
> http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641
> http://ukcatalogue.oup.com/product/9780199651641.do
> http://amzn.com/0199651647
>
>
> Engineering Mechanics 3 (Dynamics)
> http://www.springer.com/materials/mechanics/book/978-3-642-14018-1
> http://amzn.com/3642140181
>
> -----------------------------------------------
>

From s_g at berkeley.edu  Wed Dec 26 15:38:24 2012
From: s_g at berkeley.edu (Sanjay Govindjee)
Date: Wed, 26 Dec 2012 13:38:24 -0800
Subject: [petsc-users] Using superlu_dist in a direct solve
In-Reply-To: <CAGCphBtf2Q3d2VS_DYdxvTY5iaFqJFZAKkkHw2OmhrYyP7=hdw@mail.gmail.com>
References: <50D7A664.6080802@berkeley.edu>
	<CAM9tzSnY9o6EckTmo_GkD6QxHjj3o54ragfJotbacPkr8fKfmw@mail.gmail.com>
	<50D7AB15.5040606@berkeley.edu>
	<CAMYG4GmuhF7Jh7os3j9bLubuoMGnerxtN+tU=gzpeYmX+3UUXA@mail.gmail.com>
	<50D7BFF3.3030909@berkeley.edu>
	<CAMYG4GmW7FfEqvV6YEPygXFjpxjD2sDA9fkroAJ+DXXbjiW9nw@mail.gmail.com>
	<50DB6892.5040402@berkeley.edu>
	<CAGCphBubYXDONH97Pf+FGvXzHEyyJqZaZpx0JbKoBL5_ypZiWw@mail.gmail.com>
	<50DB6C05.4090006@berkeley.edu>
	<CAGCphBtf2Q3d2VS_DYdxvTY5iaFqJFZAKkkHw2OmhrYyP7=hdw@mail.gmail.com>
Message-ID: <50DB6E50.3050001@berkeley.edu>

I have a macbook pro (Mac OS X 10.7.5)

% uname -a
Darwin sg-macbook-prolocal.local 11.4.2 Darwin Kernel Version 11.4.2: 
Thu Aug 23 16:25:48 PDT 2012; root:xnu-1699.32.7~1/RELEASE_X86_64 x86_64

I configured using:

./configure --with-cc=icc --with-fc=ifort 
-download-{spooles,parmetis,superlu_dist,prometheus,mpich,ml,hypre,metis}

so everything was built together.

On 12/26/12 1:34 PM, Hong Zhang wrote:
> Sanjay:
>> hmmm....I guess that is good news -- in that superlu is not broken. However,
>> for me
>> not so good news since I seems that there is nasty bug lurking on my
>> machine.
>>
>> Any suggestions on chasing down the error?
> How did you install your supelu_dist with petsc-3.3?
> What machine do you use?
>
> Hong
>>
>> On 12/26/12 1:23 PM, Hong Zhang wrote:
>>> Sanjay:
>>> I get
>>> petsc-3.3/src/ksp/ksp/examples/tutorials>mpiexec -n 2 ./ex2
>>> -ksp_monitor_short  -ksp_type preonly -pc_type lu
>>> -pc_factor_mat_solver_package superlu_dist -m 500 -n 500
>>> Norm of error 1.92279e-11 iterations 1
>>>
>>> Hong
>>>
>>>> I have done some more testing of the problem, continuing with
>>>> src/ksp/ksp/examples/tutorials/ex2.c.
>>>>
>>>> The behavior I am seeing is that with smaller problems sizes superlu_dist
>>>> is
>>>> behaving properly
>>>> but with larger problem sizes things seem to go wrong and what goes wrong
>>>> is
>>>> apparently consistent; the error appears both with my intel build as well
>>>> as
>>>> with my gcc build.
>>>>
>>>> I have two run lines:
>>>>
>>>> runex2superlu:
>>>>           -@${MPIEXEC} -n 2 ./ex2 -ksp_monitor_short -m 100 -n 100
>>>> -ksp_type
>>>> preonly -pc_type lu -pc_factor_mat_solver_package superlu_dist
>>>>
>>>> runex2spooles:
>>>>           -@${MPIEXEC} -n 2 ./ex2 -ksp_monitor_short -m 100 -n 100
>>>> -ksp_type
>>>> preonly -pc_type lu -pc_factor_mat_solver_package spooles
>>>>
>>>>   From my intel build, I get
>>>>
>>>> sg-macbook-prolocal:tutorials sg$ make runex2superlu
>>>> Norm of error 7.66145e-13 iterations 1
>>>> sg-macbook-prolocal:tutorials sg$ make runex2spooles
>>>> Norm of error 2.21422e-12 iterations 1
>>>>
>>>>   From my GCC build, I get
>>>> sg-macbook-prolocal:tutorials sg$ make runex2superlu
>>>> Norm of error 7.66145e-13 iterations 1
>>>> sg-macbook-prolocal:tutorials sg$ make runex2spooles
>>>> Norm of error 2.21422e-12 iterations 1
>>>>
>>>> If I change the -m 100 -n 100 to -m 500 -n 500, I get for my intel build
>>>>
>>>> sg-macbook-prolocal:tutorials sg$ make runex2superlu
>>>> Norm of error 419.953 iterations 1
>>>> sg-macbook-prolocal:tutorials sg$ make runex2spooles
>>>> Norm of error 2.69468e-10 iterations 1
>>>>
>>>>   From my GCC build with -m 500 -n 500, I get
>>>>
>>>> sg-macbook-prolocal:tutorials sg$ make runex2superlu
>>>> Norm of error 419.953 iterations 1
>>>> sg-macbook-prolocal:tutorials sg$ make runex2spooles
>>>> Norm of error 2.69468e-10 iterations 1
>>>>
>>>>
>>>> Any suggestions will be greatly appreciated.
>>>>
>>>> -sanjay
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 12/23/12 6:42 PM, Matthew Knepley wrote:
>>>>
>>>>
>>>> On Sun, Dec 23, 2012 at 9:37 PM, Sanjay Govindjee <s_g at berkeley.edu>
>>>> wrote:
>>>>> I decided to go with ksp/ksp/exampeles/tutorials/ex2.c; I was unsure how
>>>>> to convert the run lines for snes/examples/ex5.c to work with a direct
>>>>> solver as I am not versed in SNES options.
>>>>>
>>>>> Notwithstanding something strange is happening only on select examples.
>>>>> With ksp/ksp/exampeles/tutorials/ex2.c and the run line:
>>>>>
>>>>> -@${MPIEXEC} -n 2 ./ex2 -ksp_monitor_short -m 20 -n 20 -ksp_type preonly
>>>>> -pc_type lu -pc_factor_mat_solver_package superlu_dist
>>>>>
>>>>> I get good results (of the order):
>>>>>
>>>>> Norm of error 1.85464e-14 iterations 1
>>>>>
>>>>> using both superlu_dist and spooles.
>>>>>
>>>>> My BLAS/LAPACK: -llapack -lblas (so native to my machine).
>>>>>
>>>>> If you can guide me on a run line for the snes ex5.c I can try that too.
>>>>> I'll also try to construct a GCC build later to see if that is an issue.
>>>>
>>>> Same line on ex5, but ex2 is good enough. However, it will not tell us
>>>> anything new. Try another build.
>>>>
>>>>      Matt
>>>>
>>>>> -sanjay
>>>>>
>>>>>
>>>>> On 12/23/12 5:58 PM, Matthew Knepley wrote:
>>>>>
>>>>> On Sun, Dec 23, 2012 at 8:08 PM, Sanjay Govindjee <s_g at berkeley.edu>
>>>>> wrote:
>>>>>> Not sure what you mean by where is your matrix?  I am simply running
>>>>>> ex6
>>>>>> in the ksp/examples/tests directory.
>>>>>>
>>>>>> The reason I ran this test is because I was seeing the same behavior
>>>>>> with
>>>>>> my finite element code (on perfectly benign problems).
>>>>>>
>>>>>> Is there a built-in test that you use to check that superlu_dist is
>>>>>> working properly with petsc?
>>>>>> i.e. something you know that works with with petsc 3.3-p5?
>>>>>
>>>>> 1) Run it on a SNES ex5 (or KSP ex2), which is a nice Laplacian
>>>>>
>>>>> 2) Compare with MUMPS
>>>>>
>>>>>      Matt
>>>>>
>>>>>> -sanjay
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 12/23/12 4:56 PM, Jed Brown wrote:
>>>>>>
>>>>>> Where is your matrix? It might be ending up with a very bad pivot. If
>>>>>> the
>>>>>> problem can be reproduced, it should be reported to the SuperLU_DIST
>>>>>> developers to fix. (Note that we do not see this with other matrices.)
>>>>>> You
>>>>>> can also try MUMPS.
>>>>>>
>>>>>>
>>>>>> On Sun, Dec 23, 2012 at 6:48 PM, Sanjay Govindjee <s_g at berkeley.edu>
>>>>>> wrote:
>>>>>>> I wanted to use SuperLU Dist to perform a direct solve but seem to be
>>>>>>> encountering
>>>>>>> a problem.  I was wonder if this is a know issue and if there is a
>>>>>>> solution for it.
>>>>>>>
>>>>>>> The problem is easily observed using ex6.c in
>>>>>>> src/ksp/ksp/examples/tests.
>>>>>>>
>>>>>>> Out of the box: make runex6 produces a residual error of O(1e-11), all
>>>>>>> is well.
>>>>>>>
>>>>>>> I then changed the run to run on two processors and add the flag
>>>>>>> -pc_factor_mat_solver_package spooles  this produces a residual error
>>>>>>> of
>>>>>>> O(1e-11), all is still well.
>>>>>>>
>>>>>>> I then switch over to -pc_factor_mat_solver_package superlu_dist and
>>>>>>> the
>>>>>>> residual error comes back as 22.6637!  Something seems very wrong.
>>>>>>>
>>>>>>> My build is perfectly vanilla:
>>>>>>>
>>>>>>> export PETSC_DIR=/Users/sg/petsc-3.3-p5/
>>>>>>> export PETSC_ARCH=intel
>>>>>>>
>>>>>>> ./configure --with-cc=icc --with-fc=ifort  \
>>>>>>>
>>>>>>> -download-{spooles,parmetis,superlu_dist,prometheus,mpich,ml,hypre,metis}
>>>>>>>
>>>>>>> make PETSC_DIR=/Users/sg/petsc-3.3-p5/ PETSC_ARCH=intel all
>>>>>>> make PETSC_DIR=/Users/sg/petsc-3.3-p5/ PETSC_ARCH=intel test
>>>>>>>
>>>>>>> -sanjay
>>>>>>
>>>>>>
>>>>>> --
>>>>>> -----------------------------------------------
>>>>>> Sanjay Govindjee, PhD, PE
>>>>>> Professor of Civil Engineering
>>>>>> Vice Chair for Academic Affairs
>>>>>>
>>>>>> 779 Davis Hall
>>>>>> Structural Engineering, Mechanics and Materials
>>>>>> Department of Civil Engineering
>>>>>> University of California
>>>>>> Berkeley, CA 94720-1710
>>>>>>
>>>>>> Voice:  +1 510 642 6060
>>>>>> FAX:    +1 510 643 5264
>>>>>> s_g at berkeley.edu
>>>>>> http://www.ce.berkeley.edu/~sanjay
>>>>>> -----------------------------------------------
>>>>>>
>>>>>> New Books:
>>>>>>
>>>>>> Engineering Mechanics of Deformable
>>>>>> Solids: A Presentation with Exercises
>>>>>>
>>>>>>
>>>>>> http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641
>>>>>> http://ukcatalogue.oup.com/product/9780199651641.do
>>>>>> http://amzn.com/0199651647
>>>>>>
>>>>>>
>>>>>> Engineering Mechanics 3 (Dynamics)
>>>>>> http://www.springer.com/materials/mechanics/book/978-3-642-14018-1
>>>>>> http://amzn.com/3642140181
>>>>>>
>>>>>> -----------------------------------------------
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> What most experimenters take for granted before they begin their
>>>>> experiments is infinitely more interesting than any results to which
>>>>> their
>>>>> experiments lead.
>>>>> -- Norbert Wiener
>>>>>
>>>>>
>>>>
>>>> --
>>>> What most experimenters take for granted before they begin their
>>>> experiments
>>>> is infinitely more interesting than any results to which their
>>>> experiments
>>>> lead.
>>>> -- Norbert Wiener
>>>>
>>>>
>> --
>> -----------------------------------------------
>> Sanjay Govindjee, PhD, PE
>> Professor of Civil Engineering
>> Vice Chair for Academic Affairs
>>
>> 779 Davis Hall
>> Structural Engineering, Mechanics and Materials
>> Department of Civil Engineering
>> University of California
>> Berkeley, CA 94720-1710
>>
>> Voice:  +1 510 642 6060
>> FAX:    +1 510 643 5264
>> s_g at berkeley.edu
>> http://www.ce.berkeley.edu/~sanjay
>> -----------------------------------------------
>>
>> New Books:
>>
>> Engineering Mechanics of Deformable
>> Solids: A Presentation with Exercises
>> http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641
>> http://ukcatalogue.oup.com/product/9780199651641.do
>> http://amzn.com/0199651647
>>
>>
>> Engineering Mechanics 3 (Dynamics)
>> http://www.springer.com/materials/mechanics/book/978-3-642-14018-1
>> http://amzn.com/3642140181
>>
>> -----------------------------------------------
>>

-- 
-----------------------------------------------
Sanjay Govindjee, PhD, PE
Professor of Civil Engineering
Vice Chair for Academic Affairs

779 Davis Hall
Structural Engineering, Mechanics and Materials
Department of Civil Engineering
University of California
Berkeley, CA 94720-1710

Voice:  +1 510 642 6060
FAX:    +1 510 643 5264
s_g at berkeley.edu
http://www.ce.berkeley.edu/~sanjay
-----------------------------------------------

New Books:

Engineering Mechanics of Deformable
Solids: A Presentation with Exercises
http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641
http://ukcatalogue.oup.com/product/9780199651641.do
http://amzn.com/0199651647


Engineering Mechanics 3 (Dynamics)
http://www.springer.com/materials/mechanics/book/978-3-642-14018-1
http://amzn.com/3642140181

-----------------------------------------------


From knepley at gmail.com  Wed Dec 26 17:08:07 2012
From: knepley at gmail.com (Matthew Knepley)
Date: Wed, 26 Dec 2012 18:08:07 -0500
Subject: [petsc-users] Using superlu_dist in a direct solve
In-Reply-To: <50DB6E50.3050001@berkeley.edu>
References: <50D7A664.6080802@berkeley.edu>
	<CAM9tzSnY9o6EckTmo_GkD6QxHjj3o54ragfJotbacPkr8fKfmw@mail.gmail.com>
	<50D7AB15.5040606@berkeley.edu>
	<CAMYG4GmuhF7Jh7os3j9bLubuoMGnerxtN+tU=gzpeYmX+3UUXA@mail.gmail.com>
	<50D7BFF3.3030909@berkeley.edu>
	<CAMYG4GmW7FfEqvV6YEPygXFjpxjD2sDA9fkroAJ+DXXbjiW9nw@mail.gmail.com>
	<50DB6892.5040402@berkeley.edu>
	<CAGCphBubYXDONH97Pf+FGvXzHEyyJqZaZpx0JbKoBL5_ypZiWw@mail.gmail.com>
	<50DB6C05.4090006@berkeley.edu>
	<CAGCphBtf2Q3d2VS_DYdxvTY5iaFqJFZAKkkHw2OmhrYyP7=hdw@mail.gmail.com>
	<50DB6E50.3050001@berkeley.edu>
Message-ID: <CAMYG4Gmn8uwypwzoGRmjaOhDRL7Mbkdqn6ii5VORh_wxok2PrA@mail.gmail.com>

On Wed, Dec 26, 2012 at 4:38 PM, Sanjay Govindjee <s_g at berkeley.edu> wrote:

> I have a macbook pro (Mac OS X 10.7.5)
>
> % uname -a
> Darwin sg-macbook-prolocal.local 11.4.2 Darwin Kernel Version 11.4.2: Thu
> Aug 23 16:25:48 PDT 2012; root:xnu-1699.32.7~1/RELEASE_**X86_64 x86_64
>
> I configured using:
>
>
> ./configure --with-cc=icc --with-fc=ifort -download-{spooles,parmetis,**
> superlu_dist,prometheus,mpich,**ml,hypre,metis}
>
> so everything was built together.


Since

  a) you have tried other compilers

  b) we cannot reproduce it

  c) we are building the library during configure

I would guess that some outside library, in your default link path, is
contaminating
the executable with symbols which override some of those in SuperLU. The
SuperLU
people are not super careful about naming. Could you

  1) Try this same exercise using --with-shared-libraries

  2) Once you do that, use otool -L on the executable so we can see where
everything comes from

  Thanks,

      Matt


> On 12/26/12 1:34 PM, Hong Zhang wrote:
>
>> Sanjay:
>>
>>> hmmm....I guess that is good news -- in that superlu is not broken.
>>> However,
>>> for me
>>> not so good news since I seems that there is nasty bug lurking on my
>>> machine.
>>>
>>> Any suggestions on chasing down the error?
>>>
>> How did you install your supelu_dist with petsc-3.3?
>> What machine do you use?
>>
>> Hong
>>
>>>
>>> On 12/26/12 1:23 PM, Hong Zhang wrote:
>>>
>>>> Sanjay:
>>>> I get
>>>> petsc-3.3/src/ksp/ksp/**examples/tutorials>mpiexec -n 2 ./ex2
>>>> -ksp_monitor_short  -ksp_type preonly -pc_type lu
>>>> -pc_factor_mat_solver_package superlu_dist -m 500 -n 500
>>>> Norm of error 1.92279e-11 iterations 1
>>>>
>>>> Hong
>>>>
>>>>  I have done some more testing of the problem, continuing with
>>>>> src/ksp/ksp/examples/**tutorials/ex2.c.
>>>>>
>>>>> The behavior I am seeing is that with smaller problems sizes
>>>>> superlu_dist
>>>>> is
>>>>> behaving properly
>>>>> but with larger problem sizes things seem to go wrong and what goes
>>>>> wrong
>>>>> is
>>>>> apparently consistent; the error appears both with my intel build as
>>>>> well
>>>>> as
>>>>> with my gcc build.
>>>>>
>>>>> I have two run lines:
>>>>>
>>>>> runex2superlu:
>>>>>           -@${MPIEXEC} -n 2 ./ex2 -ksp_monitor_short -m 100 -n 100
>>>>> -ksp_type
>>>>> preonly -pc_type lu -pc_factor_mat_solver_package superlu_dist
>>>>>
>>>>> runex2spooles:
>>>>>           -@${MPIEXEC} -n 2 ./ex2 -ksp_monitor_short -m 100 -n 100
>>>>> -ksp_type
>>>>> preonly -pc_type lu -pc_factor_mat_solver_package spooles
>>>>>
>>>>>   From my intel build, I get
>>>>>
>>>>> sg-macbook-prolocal:tutorials sg$ make runex2superlu
>>>>> Norm of error 7.66145e-13 iterations 1
>>>>> sg-macbook-prolocal:tutorials sg$ make runex2spooles
>>>>> Norm of error 2.21422e-12 iterations 1
>>>>>
>>>>>   From my GCC build, I get
>>>>> sg-macbook-prolocal:tutorials sg$ make runex2superlu
>>>>> Norm of error 7.66145e-13 iterations 1
>>>>> sg-macbook-prolocal:tutorials sg$ make runex2spooles
>>>>> Norm of error 2.21422e-12 iterations 1
>>>>>
>>>>> If I change the -m 100 -n 100 to -m 500 -n 500, I get for my intel
>>>>> build
>>>>>
>>>>> sg-macbook-prolocal:tutorials sg$ make runex2superlu
>>>>> Norm of error 419.953 iterations 1
>>>>> sg-macbook-prolocal:tutorials sg$ make runex2spooles
>>>>> Norm of error 2.69468e-10 iterations 1
>>>>>
>>>>>   From my GCC build with -m 500 -n 500, I get
>>>>>
>>>>> sg-macbook-prolocal:tutorials sg$ make runex2superlu
>>>>> Norm of error 419.953 iterations 1
>>>>> sg-macbook-prolocal:tutorials sg$ make runex2spooles
>>>>> Norm of error 2.69468e-10 iterations 1
>>>>>
>>>>>
>>>>> Any suggestions will be greatly appreciated.
>>>>>
>>>>> -sanjay
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 12/23/12 6:42 PM, Matthew Knepley wrote:
>>>>>
>>>>>
>>>>> On Sun, Dec 23, 2012 at 9:37 PM, Sanjay Govindjee <s_g at berkeley.edu>
>>>>> wrote:
>>>>>
>>>>>> I decided to go with ksp/ksp/exampeles/tutorials/**ex2.c; I was
>>>>>> unsure how
>>>>>> to convert the run lines for snes/examples/ex5.c to work with a direct
>>>>>> solver as I am not versed in SNES options.
>>>>>>
>>>>>> Notwithstanding something strange is happening only on select
>>>>>> examples.
>>>>>> With ksp/ksp/exampeles/tutorials/**ex2.c and the run line:
>>>>>>
>>>>>> -@${MPIEXEC} -n 2 ./ex2 -ksp_monitor_short -m 20 -n 20 -ksp_type
>>>>>> preonly
>>>>>> -pc_type lu -pc_factor_mat_solver_package superlu_dist
>>>>>>
>>>>>> I get good results (of the order):
>>>>>>
>>>>>> Norm of error 1.85464e-14 iterations 1
>>>>>>
>>>>>> using both superlu_dist and spooles.
>>>>>>
>>>>>> My BLAS/LAPACK: -llapack -lblas (so native to my machine).
>>>>>>
>>>>>> If you can guide me on a run line for the snes ex5.c I can try that
>>>>>> too.
>>>>>> I'll also try to construct a GCC build later to see if that is an
>>>>>> issue.
>>>>>>
>>>>>
>>>>> Same line on ex5, but ex2 is good enough. However, it will not tell us
>>>>> anything new. Try another build.
>>>>>
>>>>>      Matt
>>>>>
>>>>>  -sanjay
>>>>>>
>>>>>>
>>>>>> On 12/23/12 5:58 PM, Matthew Knepley wrote:
>>>>>>
>>>>>> On Sun, Dec 23, 2012 at 8:08 PM, Sanjay Govindjee <s_g at berkeley.edu>
>>>>>> wrote:
>>>>>>
>>>>>>> Not sure what you mean by where is your matrix?  I am simply running
>>>>>>> ex6
>>>>>>> in the ksp/examples/tests directory.
>>>>>>>
>>>>>>> The reason I ran this test is because I was seeing the same behavior
>>>>>>> with
>>>>>>> my finite element code (on perfectly benign problems).
>>>>>>>
>>>>>>> Is there a built-in test that you use to check that superlu_dist is
>>>>>>> working properly with petsc?
>>>>>>> i.e. something you know that works with with petsc 3.3-p5?
>>>>>>>
>>>>>>
>>>>>> 1) Run it on a SNES ex5 (or KSP ex2), which is a nice Laplacian
>>>>>>
>>>>>> 2) Compare with MUMPS
>>>>>>
>>>>>>      Matt
>>>>>>
>>>>>>  -sanjay
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 12/23/12 4:56 PM, Jed Brown wrote:
>>>>>>>
>>>>>>> Where is your matrix? It might be ending up with a very bad pivot. If
>>>>>>> the
>>>>>>> problem can be reproduced, it should be reported to the SuperLU_DIST
>>>>>>> developers to fix. (Note that we do not see this with other
>>>>>>> matrices.)
>>>>>>> You
>>>>>>> can also try MUMPS.
>>>>>>>
>>>>>>>
>>>>>>> On Sun, Dec 23, 2012 at 6:48 PM, Sanjay Govindjee <s_g at berkeley.edu>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I wanted to use SuperLU Dist to perform a direct solve but seem to
>>>>>>>> be
>>>>>>>> encountering
>>>>>>>> a problem.  I was wonder if this is a know issue and if there is a
>>>>>>>> solution for it.
>>>>>>>>
>>>>>>>> The problem is easily observed using ex6.c in
>>>>>>>> src/ksp/ksp/examples/tests.
>>>>>>>>
>>>>>>>> Out of the box: make runex6 produces a residual error of O(1e-11),
>>>>>>>> all
>>>>>>>> is well.
>>>>>>>>
>>>>>>>> I then changed the run to run on two processors and add the flag
>>>>>>>> -pc_factor_mat_solver_package spooles  this produces a residual
>>>>>>>> error
>>>>>>>> of
>>>>>>>> O(1e-11), all is still well.
>>>>>>>>
>>>>>>>> I then switch over to -pc_factor_mat_solver_package superlu_dist and
>>>>>>>> the
>>>>>>>> residual error comes back as 22.6637!  Something seems very wrong.
>>>>>>>>
>>>>>>>> My build is perfectly vanilla:
>>>>>>>>
>>>>>>>> export PETSC_DIR=/Users/sg/petsc-3.3-**p5/
>>>>>>>> export PETSC_ARCH=intel
>>>>>>>>
>>>>>>>> ./configure --with-cc=icc --with-fc=ifort  \
>>>>>>>>
>>>>>>>> -download-{spooles,parmetis,**superlu_dist,prometheus,mpich,**
>>>>>>>> ml,hypre,metis}
>>>>>>>>
>>>>>>>> make PETSC_DIR=/Users/sg/petsc-3.3-**p5/ PETSC_ARCH=intel all
>>>>>>>> make PETSC_DIR=/Users/sg/petsc-3.3-**p5/ PETSC_ARCH=intel test
>>>>>>>>
>>>>>>>> -sanjay
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> ------------------------------**-----------------
>>>>>>> Sanjay Govindjee, PhD, PE
>>>>>>> Professor of Civil Engineering
>>>>>>> Vice Chair for Academic Affairs
>>>>>>>
>>>>>>> 779 Davis Hall
>>>>>>> Structural Engineering, Mechanics and Materials
>>>>>>> Department of Civil Engineering
>>>>>>> University of California
>>>>>>> Berkeley, CA 94720-1710
>>>>>>>
>>>>>>> Voice:  +1 510 642 6060
>>>>>>> FAX:    +1 510 643 5264
>>>>>>> s_g at berkeley.edu
>>>>>>> http://www.ce.berkeley.edu/~**sanjay<http://www.ce.berkeley.edu/~sanjay>
>>>>>>> ------------------------------**-----------------
>>>>>>>
>>>>>>> New Books:
>>>>>>>
>>>>>>> Engineering Mechanics of Deformable
>>>>>>> Solids: A Presentation with Exercises
>>>>>>>
>>>>>>>
>>>>>>> http://www.oup.com/us/catalog/**general/subject/Physics/**
>>>>>>> MaterialsScience/?view=usa&ci=**9780199651641<http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641>
>>>>>>> http://ukcatalogue.oup.com/**product/9780199651641.do<http://ukcatalogue.oup.com/product/9780199651641.do>
>>>>>>> http://amzn.com/0199651647
>>>>>>>
>>>>>>>
>>>>>>> Engineering Mechanics 3 (Dynamics)
>>>>>>> http://www.springer.com/**materials/mechanics/book/978-**
>>>>>>> 3-642-14018-1<http://www.springer.com/materials/mechanics/book/978-3-642-14018-1>
>>>>>>> http://amzn.com/3642140181
>>>>>>>
>>>>>>> ------------------------------**-----------------
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> What most experimenters take for granted before they begin their
>>>>>> experiments is infinitely more interesting than any results to which
>>>>>> their
>>>>>> experiments lead.
>>>>>> -- Norbert Wiener
>>>>>>
>>>>>>
>>>>>>
>>>>> --
>>>>> What most experimenters take for granted before they begin their
>>>>> experiments
>>>>> is infinitely more interesting than any results to which their
>>>>> experiments
>>>>> lead.
>>>>> -- Norbert Wiener
>>>>>
>>>>>
>>>>>  --
>>> ------------------------------**-----------------
>>> Sanjay Govindjee, PhD, PE
>>> Professor of Civil Engineering
>>> Vice Chair for Academic Affairs
>>>
>>> 779 Davis Hall
>>> Structural Engineering, Mechanics and Materials
>>> Department of Civil Engineering
>>> University of California
>>> Berkeley, CA 94720-1710
>>>
>>> Voice:  +1 510 642 6060
>>> FAX:    +1 510 643 5264
>>> s_g at berkeley.edu
>>> http://www.ce.berkeley.edu/~**sanjay<http://www.ce.berkeley.edu/~sanjay>
>>> ------------------------------**-----------------
>>>
>>> New Books:
>>>
>>> Engineering Mechanics of Deformable
>>> Solids: A Presentation with Exercises
>>> http://www.oup.com/us/catalog/**general/subject/Physics/**
>>> MaterialsScience/?view=usa&ci=**9780199651641<http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641>
>>> http://ukcatalogue.oup.com/**product/9780199651641.do<http://ukcatalogue.oup.com/product/9780199651641.do>
>>> http://amzn.com/0199651647
>>>
>>>
>>> Engineering Mechanics 3 (Dynamics)
>>> http://www.springer.com/**materials/mechanics/book/978-**3-642-14018-1<http://www.springer.com/materials/mechanics/book/978-3-642-14018-1>
>>> http://amzn.com/3642140181
>>>
>>> ------------------------------**-----------------
>>>
>>>
> --
> ------------------------------**-----------------
> Sanjay Govindjee, PhD, PE
> Professor of Civil Engineering
> Vice Chair for Academic Affairs
>
> 779 Davis Hall
> Structural Engineering, Mechanics and Materials
> Department of Civil Engineering
> University of California
> Berkeley, CA 94720-1710
>
> Voice:  +1 510 642 6060
> FAX:    +1 510 643 5264
> s_g at berkeley.edu
> http://www.ce.berkeley.edu/~**sanjay <http://www.ce.berkeley.edu/~sanjay>
> ------------------------------**-----------------
>
> New Books:
>
> Engineering Mechanics of Deformable
> Solids: A Presentation with Exercises
> http://www.oup.com/us/catalog/**general/subject/Physics/**
> MaterialsScience/?view=usa&ci=**9780199651641<http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641>
> http://ukcatalogue.oup.com/**product/9780199651641.do<http://ukcatalogue.oup.com/product/9780199651641.do>
> http://amzn.com/0199651647
>
>
> Engineering Mechanics 3 (Dynamics)
> http://www.springer.com/**materials/mechanics/book/978-**3-642-14018-1<http://www.springer.com/materials/mechanics/book/978-3-642-14018-1>
> http://amzn.com/3642140181
>
> ------------------------------**-----------------
>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121226/f5346cb9/attachment.html>

From s_g at berkeley.edu  Wed Dec 26 19:24:25 2012
From: s_g at berkeley.edu (Sanjay Govindjee)
Date: Wed, 26 Dec 2012 17:24:25 -0800
Subject: [petsc-users] Using superlu_dist in a direct solve
In-Reply-To: <CAMYG4Gmn8uwypwzoGRmjaOhDRL7Mbkdqn6ii5VORh_wxok2PrA@mail.gmail.com>
References: <50D7A664.6080802@berkeley.edu>
	<CAM9tzSnY9o6EckTmo_GkD6QxHjj3o54ragfJotbacPkr8fKfmw@mail.gmail.com>
	<50D7AB15.5040606@berkeley.edu>
	<CAMYG4GmuhF7Jh7os3j9bLubuoMGnerxtN+tU=gzpeYmX+3UUXA@mail.gmail.com>
	<50D7BFF3.3030909@berkeley.edu>
	<CAMYG4GmW7FfEqvV6YEPygXFjpxjD2sDA9fkroAJ+DXXbjiW9nw@mail.gmail.com>
	<50DB6892.5040402@berkeley.edu>
	<CAGCphBubYXDONH97Pf+FGvXzHEyyJqZaZpx0JbKoBL5_ypZiWw@mail.gmail.com>
	<50DB6C05.4090006@berkeley.edu>
	<CAGCphBtf2Q3d2VS_DYdxvTY5iaFqJFZAKkkHw2OmhrYyP7=hdw@mail.gmail.com>
	<50DB6E50.3050001@berkeley.edu>
	<CAMYG4Gmn8uwypwzoGRmjaOhDRL7Mbkdqn6ii5VORh_wxok2PrA@mail.gmail.com>
Message-ID: <50DBA349.7030307@berkeley.edu>

I have re-configured/built using:

./configure PETSC_ARCH=gnu_shared 
-download-{spooles,parmetis,superlu_dist,prometheus,mpich,ml,hypre,metis} --with-shared-libraries

make  PETSC_ARCH=gnu_shared all

make  PETSC_ARCH=gnu_shared test


Using the same test problem (src/ksp/ksp/examples/tutorials/ex2.c), on 
the 100x100 case I get:

sg-macbook-prolocal:tutorials sg$ make PETSC_ARCH=gnu_shared runex2spooles
Norm of error 2.21422e-12 iterations 1
sg-macbook-prolocal:tutorials sg$ make PETSC_ARCH=gnu_shared runex2superlu
Norm of error 7.66145e-13 iterations 1

One the 500x500 case I get:

sg-macbook-prolocal:tutorials sg$ make PETSC_ARCH=gnu_shared runex2spooles
Norm of error 2.69468e-10 iterations 1
sg-macbook-prolocal:tutorials sg$ make PETSC_ARCH=gnu_shared runex2superlu
Norm of error 419.953 iterations 1

otool shows:

sg-macbook-prolocal:tutorials sg$ otool -L ex2
ex2:
     /Users/sg/petsc-3.3-p5/gnu_shared/lib/libpetsc.dylib (compatibility 
version 0.0.0, current version 0.0.0)
     /usr/X11/lib/libX11.6.dylib (compatibility version 10.0.0, current 
version 10.0.0)
     /Users/sg/petsc-3.3-p5/gnu_shared/lib/libmpichcxx.dylib 
(compatibility version 0.0.0, current version 3.0.0)
     /usr/local/lib/libstdc++.6.dylib (compatibility version 7.0.0, 
current version 7.17.0)
     /Users/sg/petsc-3.3-p5/gnu_shared/lib/libparmetis.dylib 
(compatibility version 0.0.0, current version 0.0.0)
     /Users/sg/petsc-3.3-p5/gnu_shared/lib/libmetis.dylib (compatibility 
version 0.0.0, current version 0.0.0)
/System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libLAPACK.dylib 
(compatibility version 1.0.0, current version 1.0.0)
/System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
(compatibility version 1.0.0, current version 1.0.0)
     /Users/sg/petsc-3.3-p5/gnu_shared/lib/libmpichf90.dylib 
(compatibility version 0.0.0, current version 3.0.0)
     /usr/local/lib/libgfortran.3.dylib (compatibility version 4.0.0, 
current version 4.0.0)
     /usr/local/lib/libquadmath.0.dylib (compatibility version 1.0.0, 
current version 1.0.0)
     /Users/sg/petsc-3.3-p5/gnu_shared/lib/libpmpich.dylib 
(compatibility version 0.0.0, current version 3.0.0)
     /Users/sg/petsc-3.3-p5/gnu_shared/lib/libmpich.dylib (compatibility 
version 0.0.0, current version 3.0.0)
     /Users/sg/petsc-3.3-p5/gnu_shared/lib/libopa.1.dylib (compatibility 
version 2.0.0, current version 2.0.0)
     /Users/sg/petsc-3.3-p5/gnu_shared/lib/libmpl.1.dylib (compatibility 
version 3.0.0, current version 3.0.0)
     /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current 
version 159.1.0)
     /usr/local/lib/libgcc_s.1.dylib (compatibility version 1.0.0, 
current version 1.0.0)




On 12/26/12 3:08 PM, Matthew Knepley wrote:
>
> On Wed, Dec 26, 2012 at 4:38 PM, Sanjay Govindjee <s_g at berkeley.edu 
> <mailto:s_g at berkeley.edu>> wrote:
>
>     I have a macbook pro (Mac OS X 10.7.5)
>
>     % uname -a
>     Darwin sg-macbook-prolocal.local 11.4.2 Darwin Kernel Version
>     11.4.2: Thu Aug 23 16:25:48 PDT 2012;
>     root:xnu-1699.32.7~1/RELEASE_X86_64 x86_64
>
>     I configured using:
>
>
>     ./configure --with-cc=icc --with-fc=ifort
>     -download-{spooles,parmetis,superlu_dist,prometheus,mpich,ml,hypre,metis}
>
>     so everything was built together.
>
>
> Since
>
>   a) you have tried other compilers
>
>   b) we cannot reproduce it
>
>   c) we are building the library during configure
>
> I would guess that some outside library, in your default link path, is 
> contaminating
> the executable with symbols which override some of those in SuperLU. 
> The SuperLU
> people are not super careful about naming. Could you
>
>   1) Try this same exercise using --with-shared-libraries
>
>   2) Once you do that, use otool -L on the executable so we can see 
> where everything comes from
>
>   Thanks,
>
>       Matt
>
>     On 12/26/12 1:34 PM, Hong Zhang wrote:
>
>         Sanjay:
>
>             hmmm....I guess that is good news -- in that superlu is
>             not broken. However,
>             for me
>             not so good news since I seems that there is nasty bug
>             lurking on my
>             machine.
>
>             Any suggestions on chasing down the error?
>
>         How did you install your supelu_dist with petsc-3.3?
>         What machine do you use?
>
>         Hong
>
>
>             On 12/26/12 1:23 PM, Hong Zhang wrote:
>
>                 Sanjay:
>                 I get
>                 petsc-3.3/src/ksp/ksp/examples/tutorials>mpiexec -n 2
>                 ./ex2
>                 -ksp_monitor_short  -ksp_type preonly -pc_type lu
>                 -pc_factor_mat_solver_package superlu_dist -m 500 -n 500
>                 Norm of error 1.92279e-11 iterations 1
>
>                 Hong
>
>                     I have done some more testing of the problem,
>                     continuing with
>                     src/ksp/ksp/examples/tutorials/ex2.c.
>
>                     The behavior I am seeing is that with smaller
>                     problems sizes superlu_dist
>                     is
>                     behaving properly
>                     but with larger problem sizes things seem to go
>                     wrong and what goes wrong
>                     is
>                     apparently consistent; the error appears both with
>                     my intel build as well
>                     as
>                     with my gcc build.
>
>                     I have two run lines:
>
>                     runex2superlu:
>                               -@${MPIEXEC} -n 2 ./ex2
>                     -ksp_monitor_short -m 100 -n 100
>                     -ksp_type
>                     preonly -pc_type lu -pc_factor_mat_solver_package
>                     superlu_dist
>
>                     runex2spooles:
>                               -@${MPIEXEC} -n 2 ./ex2
>                     -ksp_monitor_short -m 100 -n 100
>                     -ksp_type
>                     preonly -pc_type lu -pc_factor_mat_solver_package
>                     spooles
>
>                       From my intel build, I get
>
>                     sg-macbook-prolocal:tutorials sg$ make runex2superlu
>                     Norm of error 7.66145e-13 iterations 1
>                     sg-macbook-prolocal:tutorials sg$ make runex2spooles
>                     Norm of error 2.21422e-12 iterations 1
>
>                       From my GCC build, I get
>                     sg-macbook-prolocal:tutorials sg$ make runex2superlu
>                     Norm of error 7.66145e-13 iterations 1
>                     sg-macbook-prolocal:tutorials sg$ make runex2spooles
>                     Norm of error 2.21422e-12 iterations 1
>
>                     If I change the -m 100 -n 100 to -m 500 -n 500, I
>                     get for my intel build
>
>                     sg-macbook-prolocal:tutorials sg$ make runex2superlu
>                     Norm of error 419.953 iterations 1
>                     sg-macbook-prolocal:tutorials sg$ make runex2spooles
>                     Norm of error 2.69468e-10 iterations 1
>
>                       From my GCC build with -m 500 -n 500, I get
>
>                     sg-macbook-prolocal:tutorials sg$ make runex2superlu
>                     Norm of error 419.953 iterations 1
>                     sg-macbook-prolocal:tutorials sg$ make runex2spooles
>                     Norm of error 2.69468e-10 iterations 1
>
>
>                     Any suggestions will be greatly appreciated.
>
>                     -sanjay
>
>
>
>
>
>
>
>                     On 12/23/12 6:42 PM, Matthew Knepley wrote:
>
>
>                     On Sun, Dec 23, 2012 at 9:37 PM, Sanjay Govindjee
>                     <s_g at berkeley.edu <mailto:s_g at berkeley.edu>>
>                     wrote:
>
>                         I decided to go with
>                         ksp/ksp/exampeles/tutorials/ex2.c; I was
>                         unsure how
>                         to convert the run lines for
>                         snes/examples/ex5.c to work with a direct
>                         solver as I am not versed in SNES options.
>
>                         Notwithstanding something strange is happening
>                         only on select examples.
>                         With ksp/ksp/exampeles/tutorials/ex2.c and the
>                         run line:
>
>                         -@${MPIEXEC} -n 2 ./ex2 -ksp_monitor_short -m
>                         20 -n 20 -ksp_type preonly
>                         -pc_type lu -pc_factor_mat_solver_package
>                         superlu_dist
>
>                         I get good results (of the order):
>
>                         Norm of error 1.85464e-14 iterations 1
>
>                         using both superlu_dist and spooles.
>
>                         My BLAS/LAPACK: -llapack -lblas (so native to
>                         my machine).
>
>                         If you can guide me on a run line for the snes
>                         ex5.c I can try that too.
>                         I'll also try to construct a GCC build later
>                         to see if that is an issue.
>
>
>                     Same line on ex5, but ex2 is good enough. However,
>                     it will not tell us
>                     anything new. Try another build.
>
>                          Matt
>
>                         -sanjay
>
>
>                         On 12/23/12 5:58 PM, Matthew Knepley wrote:
>
>                         On Sun, Dec 23, 2012 at 8:08 PM, Sanjay
>                         Govindjee <s_g at berkeley.edu
>                         <mailto:s_g at berkeley.edu>>
>                         wrote:
>
>                             Not sure what you mean by where is your
>                             matrix?  I am simply running
>                             ex6
>                             in the ksp/examples/tests directory.
>
>                             The reason I ran this test is because I
>                             was seeing the same behavior
>                             with
>                             my finite element code (on perfectly
>                             benign problems).
>
>                             Is there a built-in test that you use to
>                             check that superlu_dist is
>                             working properly with petsc?
>                             i.e. something you know that works with
>                             with petsc 3.3-p5?
>
>
>                         1) Run it on a SNES ex5 (or KSP ex2), which is
>                         a nice Laplacian
>
>                         2) Compare with MUMPS
>
>                              Matt
>
>                             -sanjay
>
>
>
>                             On 12/23/12 4:56 PM, Jed Brown wrote:
>
>                             Where is your matrix? It might be ending
>                             up with a very bad pivot. If
>                             the
>                             problem can be reproduced, it should be
>                             reported to the SuperLU_DIST
>                             developers to fix. (Note that we do not
>                             see this with other matrices.)
>                             You
>                             can also try MUMPS.
>
>
>                             On Sun, Dec 23, 2012 at 6:48 PM, Sanjay
>                             Govindjee <s_g at berkeley.edu
>                             <mailto:s_g at berkeley.edu>>
>                             wrote:
>
>                                 I wanted to use SuperLU Dist to
>                                 perform a direct solve but seem to be
>                                 encountering
>                                 a problem.  I was wonder if this is a
>                                 know issue and if there is a
>                                 solution for it.
>
>                                 The problem is easily observed using
>                                 ex6.c in
>                                 src/ksp/ksp/examples/tests.
>
>                                 Out of the box: make runex6 produces a
>                                 residual error of O(1e-11), all
>                                 is well.
>
>                                 I then changed the run to run on two
>                                 processors and add the flag
>                                 -pc_factor_mat_solver_package spooles
>                                  this produces a residual error
>                                 of
>                                 O(1e-11), all is still well.
>
>                                 I then switch over to
>                                 -pc_factor_mat_solver_package
>                                 superlu_dist and
>                                 the
>                                 residual error comes back as 22.6637!
>                                  Something seems very wrong.
>
>                                 My build is perfectly vanilla:
>
>                                 export PETSC_DIR=/Users/sg/petsc-3.3-p5/
>                                 export PETSC_ARCH=intel
>
>                                 ./configure --with-cc=icc
>                                 --with-fc=ifort  \
>
>                                 -download-{spooles,parmetis,superlu_dist,prometheus,mpich,ml,hypre,metis}
>
>                                 make PETSC_DIR=/Users/sg/petsc-3.3-p5/
>                                 PETSC_ARCH=intel all
>                                 make PETSC_DIR=/Users/sg/petsc-3.3-p5/
>                                 PETSC_ARCH=intel test
>
>                                 -sanjay
>
>
>
>                             --
>                             -----------------------------------------------
>                             Sanjay Govindjee, PhD, PE
>                             Professor of Civil Engineering
>                             Vice Chair for Academic Affairs
>
>                             779 Davis Hall
>                             Structural Engineering, Mechanics and
>                             Materials
>                             Department of Civil Engineering
>                             University of California
>                             Berkeley, CA 94720-1710
>
>                             Voice: +1 510 642 6060
>                             <tel:%2B1%20510%20642%206060>
>                             FAX: +1 510 643 5264
>                             <tel:%2B1%20510%20643%205264>
>                             s_g at berkeley.edu <mailto:s_g at berkeley.edu>
>                             http://www.ce.berkeley.edu/~sanjay
>                             <http://www.ce.berkeley.edu/%7Esanjay>
>                             -----------------------------------------------
>
>                             New Books:
>
>                             Engineering Mechanics of Deformable
>                             Solids: A Presentation with Exercises
>
>
>                             http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641
>                             http://ukcatalogue.oup.com/product/9780199651641.do
>                             http://amzn.com/0199651647
>
>
>                             Engineering Mechanics 3 (Dynamics)
>                             http://www.springer.com/materials/mechanics/book/978-3-642-14018-1
>                             http://amzn.com/3642140181
>
>                             -----------------------------------------------
>
>
>
>
>                         --
>                         What most experimenters take for granted
>                         before they begin their
>                         experiments is infinitely more interesting
>                         than any results to which
>                         their
>                         experiments lead.
>                         -- Norbert Wiener
>
>
>
>                     --
>                     What most experimenters take for granted before
>                     they begin their
>                     experiments
>                     is infinitely more interesting than any results to
>                     which their
>                     experiments
>                     lead.
>                     -- Norbert Wiener
>
>
>             --
>             -----------------------------------------------
>             Sanjay Govindjee, PhD, PE
>             Professor of Civil Engineering
>             Vice Chair for Academic Affairs
>
>             779 Davis Hall
>             Structural Engineering, Mechanics and Materials
>             Department of Civil Engineering
>             University of California
>             Berkeley, CA 94720-1710
>
>             Voice: +1 510 642 6060 <tel:%2B1%20510%20642%206060>
>             FAX: +1 510 643 5264 <tel:%2B1%20510%20643%205264>
>             s_g at berkeley.edu <mailto:s_g at berkeley.edu>
>             http://www.ce.berkeley.edu/~sanjay
>             <http://www.ce.berkeley.edu/%7Esanjay>
>             -----------------------------------------------
>
>             New Books:
>
>             Engineering Mechanics of Deformable
>             Solids: A Presentation with Exercises
>             http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641
>             http://ukcatalogue.oup.com/product/9780199651641.do
>             http://amzn.com/0199651647
>
>
>             Engineering Mechanics 3 (Dynamics)
>             http://www.springer.com/materials/mechanics/book/978-3-642-14018-1
>             http://amzn.com/3642140181
>
>             -----------------------------------------------
>
>
>     -- 
>     -----------------------------------------------
>     Sanjay Govindjee, PhD, PE
>     Professor of Civil Engineering
>     Vice Chair for Academic Affairs
>
>     779 Davis Hall
>     Structural Engineering, Mechanics and Materials
>     Department of Civil Engineering
>     University of California
>     Berkeley, CA 94720-1710
>
>     Voice: +1 510 642 6060 <tel:%2B1%20510%20642%206060>
>     FAX: +1 510 643 5264 <tel:%2B1%20510%20643%205264>
>     s_g at berkeley.edu <mailto:s_g at berkeley.edu>
>     http://www.ce.berkeley.edu/~sanjay
>     <http://www.ce.berkeley.edu/%7Esanjay>
>     -----------------------------------------------
>
>     New Books:
>
>     Engineering Mechanics of Deformable
>     Solids: A Presentation with Exercises
>     http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641
>     http://ukcatalogue.oup.com/product/9780199651641.do
>     http://amzn.com/0199651647
>
>
>     Engineering Mechanics 3 (Dynamics)
>     http://www.springer.com/materials/mechanics/book/978-3-642-14018-1
>     http://amzn.com/3642140181
>
>     -----------------------------------------------
>
>
>
>
> -- 
> What most experimenters take for granted before they begin their 
> experiments is infinitely more interesting than any results to which 
> their experiments lead.
> -- Norbert Wiener

-- 
-----------------------------------------------
Sanjay Govindjee, PhD, PE
Professor of Civil Engineering
Vice Chair for Academic Affairs

779 Davis Hall
Structural Engineering, Mechanics and Materials
Department of Civil Engineering
University of California
Berkeley, CA 94720-1710

Voice:  +1 510 642 6060
FAX:    +1 510 643 5264
s_g at berkeley.edu
http://www.ce.berkeley.edu/~sanjay
-----------------------------------------------

New Books:

Engineering Mechanics of Deformable
Solids: A Presentation with Exercises
http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641
http://ukcatalogue.oup.com/product/9780199651641.do
http://amzn.com/0199651647


Engineering Mechanics 3 (Dynamics)
http://www.springer.com/materials/mechanics/book/978-3-642-14018-1
http://amzn.com/3642140181

-----------------------------------------------

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121226/c35dc93b/attachment-0001.html>

From s_g at berkeley.edu  Wed Dec 26 19:34:56 2012
From: s_g at berkeley.edu (Sanjay Govindjee)
Date: Wed, 26 Dec 2012 17:34:56 -0800
Subject: [petsc-users] Using superlu_dist in a direct solve
In-Reply-To: <50DBA349.7030307@berkeley.edu>
References: <50D7A664.6080802@berkeley.edu>
	<CAM9tzSnY9o6EckTmo_GkD6QxHjj3o54ragfJotbacPkr8fKfmw@mail.gmail.com>
	<50D7AB15.5040606@berkeley.edu>
	<CAMYG4GmuhF7Jh7os3j9bLubuoMGnerxtN+tU=gzpeYmX+3UUXA@mail.gmail.com>
	<50D7BFF3.3030909@berkeley.edu>
	<CAMYG4GmW7FfEqvV6YEPygXFjpxjD2sDA9fkroAJ+DXXbjiW9nw@mail.gmail.com>
	<50DB6892.5040402@berkeley.edu>
	<CAGCphBubYXDONH97Pf+FGvXzHEyyJqZaZpx0JbKoBL5_ypZiWw@mail.gmail.com>
	<50DB6C05.4090006@berkeley.edu>
	<CAGCphBtf2Q3d2VS_DYdxvTY5iaFqJFZAKkkHw2OmhrYyP7=hdw@mail.gmail.com>
	<50DB6E50.3050001@berkeley.edu>
	<CAMYG4Gmn8uwypwzoGRmjaOhDRL7Mbkdqn6ii5VORh_wxok2PrA@mail.gmail.com>
	<50DBA349.7030307@berkeley.edu>
Message-ID: <50DBA5C0.4050807@berkeley.edu>

For what it is worth.  I ran the problems with valgrind (before I built 
the --with-shared-libraries version).
With spooles the run is essentially clean.  With superlu I see lots of 
errors of the type:

==91099== Syscall param writev(vector[...]) points to uninitialised byte(s)
==91099==    at 0x1245FF2: writev (in 
/usr/lib/system/libsystem_kernel.dylib)
==91099==    by 0x101209846: MPIDU_Sock_writev (in ./ex2)
==91099==    by 0x101A2BA23: ???
==91099==    by 0x1FFFFFFFB: ???
==91099==    by 0x101A2BA0F: ???
==91099==    by 0x10852053F: ???
==91099==    by 0x101A24907: ???
==91099==    by 0x7FFF5FBFE2DF: ???
==91099==    by 0x1: ???
==91099==    by 0x10120AF13: MPIDI_CH3_iSendv (in ./ex2)
==91099==  Address 0x10712d0c8 is 136 bytes inside a block of size 
1,661,792 alloc'd
==91099==    at 0xC713: malloc (vg_replace_malloc.c:271)
==91099==    by 0x100D5C6DF: superlu_malloc_dist (in ./ex2)
==91099==    by 0x100D23375: doubleMalloc_dist (in ./ex2)
==91099==    by 0x100D415C1: pdgstrs (in ./ex2)
==91099==    by 0x100D3F852: pdgssvx (in ./ex2)
==91099==    by 0x1007E5D38: MatSolve_SuperLU_DIST (in ./ex2)
==91099==    by 0x1002BDA1E: MatSolve (in ./ex2)
==91099==    by 0x1009EAF55: PCApply_LU (in ./ex2)
==91099==    by 0x100AAE053: PCApply (in ./ex2)
==91099==    by 0x100B1BCEE: KSPSolve_PREONLY (in ./ex2)
==91099==    by 0x100B54F55: KSPSolve (in ./ex2)
==91099==    by 0x1000022FC: main (in ./ex2)


On 12/26/12 5:24 PM, Sanjay Govindjee wrote:
> I have re-configured/built using:
>
> ./configure PETSC_ARCH=gnu_shared 
> -download-{spooles,parmetis,superlu_dist,prometheus,mpich,ml,hypre,metis} 
> --with-shared-libraries
>
> make  PETSC_ARCH=gnu_shared all
>
> make  PETSC_ARCH=gnu_shared test
>
>
> Using the same test problem (src/ksp/ksp/examples/tutorials/ex2.c), on 
> the 100x100 case I get:
>
> sg-macbook-prolocal:tutorials sg$ make PETSC_ARCH=gnu_shared runex2spooles
> Norm of error 2.21422e-12 iterations 1
> sg-macbook-prolocal:tutorials sg$ make PETSC_ARCH=gnu_shared runex2superlu
> Norm of error 7.66145e-13 iterations 1
>
> One the 500x500 case I get:
>
> sg-macbook-prolocal:tutorials sg$ make PETSC_ARCH=gnu_shared runex2spooles
> Norm of error 2.69468e-10 iterations 1
> sg-macbook-prolocal:tutorials sg$ make PETSC_ARCH=gnu_shared runex2superlu
> Norm of error 419.953 iterations 1
>
> otool shows:
>
> sg-macbook-prolocal:tutorials sg$ otool -L ex2
> ex2:
>     /Users/sg/petsc-3.3-p5/gnu_shared/lib/libpetsc.dylib 
> (compatibility version 0.0.0, current version 0.0.0)
>     /usr/X11/lib/libX11.6.dylib (compatibility version 10.0.0, current 
> version 10.0.0)
>     /Users/sg/petsc-3.3-p5/gnu_shared/lib/libmpichcxx.dylib 
> (compatibility version 0.0.0, current version 3.0.0)
>     /usr/local/lib/libstdc++.6.dylib (compatibility version 7.0.0, 
> current version 7.17.0)
>     /Users/sg/petsc-3.3-p5/gnu_shared/lib/libparmetis.dylib 
> (compatibility version 0.0.0, current version 0.0.0)
>     /Users/sg/petsc-3.3-p5/gnu_shared/lib/libmetis.dylib 
> (compatibility version 0.0.0, current version 0.0.0)
> /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libLAPACK.dylib 
> (compatibility version 1.0.0, current version 1.0.0)
> /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
> (compatibility version 1.0.0, current version 1.0.0)
>     /Users/sg/petsc-3.3-p5/gnu_shared/lib/libmpichf90.dylib 
> (compatibility version 0.0.0, current version 3.0.0)
>     /usr/local/lib/libgfortran.3.dylib (compatibility version 4.0.0, 
> current version 4.0.0)
>     /usr/local/lib/libquadmath.0.dylib (compatibility version 1.0.0, 
> current version 1.0.0)
>     /Users/sg/petsc-3.3-p5/gnu_shared/lib/libpmpich.dylib 
> (compatibility version 0.0.0, current version 3.0.0)
>     /Users/sg/petsc-3.3-p5/gnu_shared/lib/libmpich.dylib 
> (compatibility version 0.0.0, current version 3.0.0)
>     /Users/sg/petsc-3.3-p5/gnu_shared/lib/libopa.1.dylib 
> (compatibility version 2.0.0, current version 2.0.0)
>     /Users/sg/petsc-3.3-p5/gnu_shared/lib/libmpl.1.dylib 
> (compatibility version 3.0.0, current version 3.0.0)
>     /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current 
> version 159.1.0)
>     /usr/local/lib/libgcc_s.1.dylib (compatibility version 1.0.0, 
> current version 1.0.0)
>
>
>
>
> On 12/26/12 3:08 PM, Matthew Knepley wrote:
>>
>> On Wed, Dec 26, 2012 at 4:38 PM, Sanjay Govindjee <s_g at berkeley.edu 
>> <mailto:s_g at berkeley.edu>> wrote:
>>
>>     I have a macbook pro (Mac OS X 10.7.5)
>>
>>     % uname -a
>>     Darwin sg-macbook-prolocal.local 11.4.2 Darwin Kernel Version
>>     11.4.2: Thu Aug 23 16:25:48 PDT 2012;
>>     root:xnu-1699.32.7~1/RELEASE_X86_64 x86_64
>>
>>     I configured using:
>>
>>
>>     ./configure --with-cc=icc --with-fc=ifort
>>     -download-{spooles,parmetis,superlu_dist,prometheus,mpich,ml,hypre,metis}
>>
>>     so everything was built together.
>>
>>
>> Since
>>
>>   a) you have tried other compilers
>>
>>   b) we cannot reproduce it
>>
>>   c) we are building the library during configure
>>
>> I would guess that some outside library, in your default link path, 
>> is contaminating
>> the executable with symbols which override some of those in SuperLU. 
>> The SuperLU
>> people are not super careful about naming. Could you
>>
>>   1) Try this same exercise using --with-shared-libraries
>>
>>   2) Once you do that, use otool -L on the executable so we can see 
>> where everything comes from
>>
>>   Thanks,
>>
>>       Matt
>>
>>     On 12/26/12 1:34 PM, Hong Zhang wrote:
>>
>>         Sanjay:
>>
>>             hmmm....I guess that is good news -- in that superlu is
>>             not broken. However,
>>             for me
>>             not so good news since I seems that there is nasty bug
>>             lurking on my
>>             machine.
>>
>>             Any suggestions on chasing down the error?
>>
>>         How did you install your supelu_dist with petsc-3.3?
>>         What machine do you use?
>>
>>         Hong
>>
>>
>>             On 12/26/12 1:23 PM, Hong Zhang wrote:
>>
>>                 Sanjay:
>>                 I get
>>                 petsc-3.3/src/ksp/ksp/examples/tutorials>mpiexec -n 2
>>                 ./ex2
>>                 -ksp_monitor_short  -ksp_type preonly -pc_type lu
>>                 -pc_factor_mat_solver_package superlu_dist -m 500 -n 500
>>                 Norm of error 1.92279e-11 iterations 1
>>
>>                 Hong
>>
>>                     I have done some more testing of the problem,
>>                     continuing with
>>                     src/ksp/ksp/examples/tutorials/ex2.c.
>>
>>                     The behavior I am seeing is that with smaller
>>                     problems sizes superlu_dist
>>                     is
>>                     behaving properly
>>                     but with larger problem sizes things seem to go
>>                     wrong and what goes wrong
>>                     is
>>                     apparently consistent; the error appears both
>>                     with my intel build as well
>>                     as
>>                     with my gcc build.
>>
>>                     I have two run lines:
>>
>>                     runex2superlu:
>>                               -@${MPIEXEC} -n 2 ./ex2
>>                     -ksp_monitor_short -m 100 -n 100
>>                     -ksp_type
>>                     preonly -pc_type lu -pc_factor_mat_solver_package
>>                     superlu_dist
>>
>>                     runex2spooles:
>>                               -@${MPIEXEC} -n 2 ./ex2
>>                     -ksp_monitor_short -m 100 -n 100
>>                     -ksp_type
>>                     preonly -pc_type lu -pc_factor_mat_solver_package
>>                     spooles
>>
>>                       From my intel build, I get
>>
>>                     sg-macbook-prolocal:tutorials sg$ make runex2superlu
>>                     Norm of error 7.66145e-13 iterations 1
>>                     sg-macbook-prolocal:tutorials sg$ make runex2spooles
>>                     Norm of error 2.21422e-12 iterations 1
>>
>>                       From my GCC build, I get
>>                     sg-macbook-prolocal:tutorials sg$ make runex2superlu
>>                     Norm of error 7.66145e-13 iterations 1
>>                     sg-macbook-prolocal:tutorials sg$ make runex2spooles
>>                     Norm of error 2.21422e-12 iterations 1
>>
>>                     If I change the -m 100 -n 100 to -m 500 -n 500, I
>>                     get for my intel build
>>
>>                     sg-macbook-prolocal:tutorials sg$ make runex2superlu
>>                     Norm of error 419.953 iterations 1
>>                     sg-macbook-prolocal:tutorials sg$ make runex2spooles
>>                     Norm of error 2.69468e-10 iterations 1
>>
>>                       From my GCC build with -m 500 -n 500, I get
>>
>>                     sg-macbook-prolocal:tutorials sg$ make runex2superlu
>>                     Norm of error 419.953 iterations 1
>>                     sg-macbook-prolocal:tutorials sg$ make runex2spooles
>>                     Norm of error 2.69468e-10 iterations 1
>>
>>
>>                     Any suggestions will be greatly appreciated.
>>
>>                     -sanjay
>>
>>
>>
>>
>>
>>
>>
>>                     On 12/23/12 6:42 PM, Matthew Knepley wrote:
>>
>>
>>                     On Sun, Dec 23, 2012 at 9:37 PM, Sanjay Govindjee
>>                     <s_g at berkeley.edu <mailto:s_g at berkeley.edu>>
>>                     wrote:
>>
>>                         I decided to go with
>>                         ksp/ksp/exampeles/tutorials/ex2.c; I was
>>                         unsure how
>>                         to convert the run lines for
>>                         snes/examples/ex5.c to work with a direct
>>                         solver as I am not versed in SNES options.
>>
>>                         Notwithstanding something strange is
>>                         happening only on select examples.
>>                         With ksp/ksp/exampeles/tutorials/ex2.c and
>>                         the run line:
>>
>>                         -@${MPIEXEC} -n 2 ./ex2 -ksp_monitor_short -m
>>                         20 -n 20 -ksp_type preonly
>>                         -pc_type lu -pc_factor_mat_solver_package
>>                         superlu_dist
>>
>>                         I get good results (of the order):
>>
>>                         Norm of error 1.85464e-14 iterations 1
>>
>>                         using both superlu_dist and spooles.
>>
>>                         My BLAS/LAPACK: -llapack -lblas (so native to
>>                         my machine).
>>
>>                         If you can guide me on a run line for the
>>                         snes ex5.c I can try that too.
>>                         I'll also try to construct a GCC build later
>>                         to see if that is an issue.
>>
>>
>>                     Same line on ex5, but ex2 is good enough.
>>                     However, it will not tell us
>>                     anything new. Try another build.
>>
>>                          Matt
>>
>>                         -sanjay
>>
>>
>>                         On 12/23/12 5:58 PM, Matthew Knepley wrote:
>>
>>                         On Sun, Dec 23, 2012 at 8:08 PM, Sanjay
>>                         Govindjee <s_g at berkeley.edu
>>                         <mailto:s_g at berkeley.edu>>
>>                         wrote:
>>
>>                             Not sure what you mean by where is your
>>                             matrix?  I am simply running
>>                             ex6
>>                             in the ksp/examples/tests directory.
>>
>>                             The reason I ran this test is because I
>>                             was seeing the same behavior
>>                             with
>>                             my finite element code (on perfectly
>>                             benign problems).
>>
>>                             Is there a built-in test that you use to
>>                             check that superlu_dist is
>>                             working properly with petsc?
>>                             i.e. something you know that works with
>>                             with petsc 3.3-p5?
>>
>>
>>                         1) Run it on a SNES ex5 (or KSP ex2), which
>>                         is a nice Laplacian
>>
>>                         2) Compare with MUMPS
>>
>>                              Matt
>>
>>                             -sanjay
>>
>>
>>
>>                             On 12/23/12 4:56 PM, Jed Brown wrote:
>>
>>                             Where is your matrix? It might be ending
>>                             up with a very bad pivot. If
>>                             the
>>                             problem can be reproduced, it should be
>>                             reported to the SuperLU_DIST
>>                             developers to fix. (Note that we do not
>>                             see this with other matrices.)
>>                             You
>>                             can also try MUMPS.
>>
>>
>>                             On Sun, Dec 23, 2012 at 6:48 PM, Sanjay
>>                             Govindjee <s_g at berkeley.edu
>>                             <mailto:s_g at berkeley.edu>>
>>                             wrote:
>>
>>                                 I wanted to use SuperLU Dist to
>>                                 perform a direct solve but seem to be
>>                                 encountering
>>                                 a problem.  I was wonder if this is a
>>                                 know issue and if there is a
>>                                 solution for it.
>>
>>                                 The problem is easily observed using
>>                                 ex6.c in
>>                                 src/ksp/ksp/examples/tests.
>>
>>                                 Out of the box: make runex6 produces
>>                                 a residual error of O(1e-11), all
>>                                 is well.
>>
>>                                 I then changed the run to run on two
>>                                 processors and add the flag
>>                                 -pc_factor_mat_solver_package spooles
>>                                  this produces a residual error
>>                                 of
>>                                 O(1e-11), all is still well.
>>
>>                                 I then switch over to
>>                                 -pc_factor_mat_solver_package
>>                                 superlu_dist and
>>                                 the
>>                                 residual error comes back as 22.6637!
>>                                  Something seems very wrong.
>>
>>                                 My build is perfectly vanilla:
>>
>>                                 export PETSC_DIR=/Users/sg/petsc-3.3-p5/
>>                                 export PETSC_ARCH=intel
>>
>>                                 ./configure --with-cc=icc
>>                                 --with-fc=ifort  \
>>
>>                                 -download-{spooles,parmetis,superlu_dist,prometheus,mpich,ml,hypre,metis}
>>
>>                                 make
>>                                 PETSC_DIR=/Users/sg/petsc-3.3-p5/
>>                                 PETSC_ARCH=intel all
>>                                 make
>>                                 PETSC_DIR=/Users/sg/petsc-3.3-p5/
>>                                 PETSC_ARCH=intel test
>>
>>                                 -sanjay
>>
>>
>>
>>                             --
>>                             -----------------------------------------------
>>                             Sanjay Govindjee, PhD, PE
>>                             Professor of Civil Engineering
>>                             Vice Chair for Academic Affairs
>>
>>                             779 Davis Hall
>>                             Structural Engineering, Mechanics and
>>                             Materials
>>                             Department of Civil Engineering
>>                             University of California
>>                             Berkeley, CA 94720-1710
>>
>>                             Voice: +1 510 642 6060
>>                             <tel:%2B1%20510%20642%206060>
>>                             FAX: +1 510 643 5264
>>                             <tel:%2B1%20510%20643%205264>
>>                             s_g at berkeley.edu <mailto:s_g at berkeley.edu>
>>                             http://www.ce.berkeley.edu/~sanjay
>>                             <http://www.ce.berkeley.edu/%7Esanjay>
>>                             -----------------------------------------------
>>
>>                             New Books:
>>
>>                             Engineering Mechanics of Deformable
>>                             Solids: A Presentation with Exercises
>>
>>
>>                             http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641
>>                             http://ukcatalogue.oup.com/product/9780199651641.do
>>                             http://amzn.com/0199651647
>>
>>
>>                             Engineering Mechanics 3 (Dynamics)
>>                             http://www.springer.com/materials/mechanics/book/978-3-642-14018-1
>>                             http://amzn.com/3642140181
>>
>>                             -----------------------------------------------
>>
>>
>>
>>
>>                         --
>>                         What most experimenters take for granted
>>                         before they begin their
>>                         experiments is infinitely more interesting
>>                         than any results to which
>>                         their
>>                         experiments lead.
>>                         -- Norbert Wiener
>>
>>
>>
>>                     --
>>                     What most experimenters take for granted before
>>                     they begin their
>>                     experiments
>>                     is infinitely more interesting than any results
>>                     to which their
>>                     experiments
>>                     lead.
>>                     -- Norbert Wiener
>>
>>
>>             --
>>             -----------------------------------------------
>>             Sanjay Govindjee, PhD, PE
>>             Professor of Civil Engineering
>>             Vice Chair for Academic Affairs
>>
>>             779 Davis Hall
>>             Structural Engineering, Mechanics and Materials
>>             Department of Civil Engineering
>>             University of California
>>             Berkeley, CA 94720-1710
>>
>>             Voice: +1 510 642 6060 <tel:%2B1%20510%20642%206060>
>>             FAX: +1 510 643 5264 <tel:%2B1%20510%20643%205264>
>>             s_g at berkeley.edu <mailto:s_g at berkeley.edu>
>>             http://www.ce.berkeley.edu/~sanjay
>>             <http://www.ce.berkeley.edu/%7Esanjay>
>>             -----------------------------------------------
>>
>>             New Books:
>>
>>             Engineering Mechanics of Deformable
>>             Solids: A Presentation with Exercises
>>             http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641
>>             http://ukcatalogue.oup.com/product/9780199651641.do
>>             http://amzn.com/0199651647
>>
>>
>>             Engineering Mechanics 3 (Dynamics)
>>             http://www.springer.com/materials/mechanics/book/978-3-642-14018-1
>>             http://amzn.com/3642140181
>>
>>             -----------------------------------------------
>>
>>
>>     -- 
>>     -----------------------------------------------
>>     Sanjay Govindjee, PhD, PE
>>     Professor of Civil Engineering
>>     Vice Chair for Academic Affairs
>>
>>     779 Davis Hall
>>     Structural Engineering, Mechanics and Materials
>>     Department of Civil Engineering
>>     University of California
>>     Berkeley, CA 94720-1710
>>
>>     Voice: +1 510 642 6060 <tel:%2B1%20510%20642%206060>
>>     FAX: +1 510 643 5264 <tel:%2B1%20510%20643%205264>
>>     s_g at berkeley.edu <mailto:s_g at berkeley.edu>
>>     http://www.ce.berkeley.edu/~sanjay
>>     <http://www.ce.berkeley.edu/%7Esanjay>
>>     -----------------------------------------------
>>
>>     New Books:
>>
>>     Engineering Mechanics of Deformable
>>     Solids: A Presentation with Exercises
>>     http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641
>>     http://ukcatalogue.oup.com/product/9780199651641.do
>>     http://amzn.com/0199651647
>>
>>
>>     Engineering Mechanics 3 (Dynamics)
>>     http://www.springer.com/materials/mechanics/book/978-3-642-14018-1
>>     http://amzn.com/3642140181
>>
>>     -----------------------------------------------
>>
>>
>>
>>
>> -- 
>> What most experimenters take for granted before they begin their 
>> experiments is infinitely more interesting than any results to which 
>> their experiments lead.
>> -- Norbert Wiener
>
> -- 
> -----------------------------------------------
> Sanjay Govindjee, PhD, PE
> Professor of Civil Engineering
> Vice Chair for Academic Affairs
>
> 779 Davis Hall
> Structural Engineering, Mechanics and Materials
> Department of Civil Engineering
> University of California
> Berkeley, CA 94720-1710
>
> Voice:  +1 510 642 6060
> FAX:    +1 510 643 5264
> s_g at berkeley.edu
> http://www.ce.berkeley.edu/~sanjay
> -----------------------------------------------
>
> New Books:
>
> Engineering Mechanics of Deformable
> Solids: A Presentation with Exercises
> http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641
> http://ukcatalogue.oup.com/product/9780199651641.do
> http://amzn.com/0199651647
>
>
> Engineering Mechanics 3 (Dynamics)
> http://www.springer.com/materials/mechanics/book/978-3-642-14018-1
> http://amzn.com/3642140181
>
> -----------------------------------------------

-- 
-----------------------------------------------
Sanjay Govindjee, PhD, PE
Professor of Civil Engineering
Vice Chair for Academic Affairs

779 Davis Hall
Structural Engineering, Mechanics and Materials
Department of Civil Engineering
University of California
Berkeley, CA 94720-1710

Voice:  +1 510 642 6060
FAX:    +1 510 643 5264
s_g at berkeley.edu
http://www.ce.berkeley.edu/~sanjay
-----------------------------------------------

New Books:

Engineering Mechanics of Deformable
Solids: A Presentation with Exercises
http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641
http://ukcatalogue.oup.com/product/9780199651641.do
http://amzn.com/0199651647


Engineering Mechanics 3 (Dynamics)
http://www.springer.com/materials/mechanics/book/978-3-642-14018-1
http://amzn.com/3642140181

-----------------------------------------------

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121226/a661225c/attachment-0001.html>

From knepley at gmail.com  Wed Dec 26 20:46:59 2012
From: knepley at gmail.com (Matthew Knepley)
Date: Wed, 26 Dec 2012 21:46:59 -0500
Subject: [petsc-users] Using superlu_dist in a direct solve
In-Reply-To: <50DBA5C0.4050807@berkeley.edu>
References: <50D7A664.6080802@berkeley.edu>
	<CAM9tzSnY9o6EckTmo_GkD6QxHjj3o54ragfJotbacPkr8fKfmw@mail.gmail.com>
	<50D7AB15.5040606@berkeley.edu>
	<CAMYG4GmuhF7Jh7os3j9bLubuoMGnerxtN+tU=gzpeYmX+3UUXA@mail.gmail.com>
	<50D7BFF3.3030909@berkeley.edu>
	<CAMYG4GmW7FfEqvV6YEPygXFjpxjD2sDA9fkroAJ+DXXbjiW9nw@mail.gmail.com>
	<50DB6892.5040402@berkeley.edu>
	<CAGCphBubYXDONH97Pf+FGvXzHEyyJqZaZpx0JbKoBL5_ypZiWw@mail.gmail.com>
	<50DB6C05.4090006@berkeley.edu>
	<CAGCphBtf2Q3d2VS_DYdxvTY5iaFqJFZAKkkHw2OmhrYyP7=hdw@mail.gmail.com>
	<50DB6E50.3050001@berkeley.edu>
	<CAMYG4Gmn8uwypwzoGRmjaOhDRL7Mbkdqn6ii5VORh_wxok2PrA@mail.gmail.com>
	<50DBA349.7030307@berkeley.edu> <50DBA5C0.4050807@berkeley.edu>
Message-ID: <CAMYG4GmMiZqkeCS-X-KB2nkZD=8UWhBZ90TkbWhHbYpTA7jXrg@mail.gmail.com>

On Wed, Dec 26, 2012 at 8:34 PM, Sanjay Govindjee <s_g at berkeley.edu> wrote:

>  For what it is worth.  I ran the problems with valgrind (before I built
> the --with-shared-libraries version).
> With spooles the run is essentially clean.  With superlu I see lots of
> errors of the type:
>

This looks like a well-known MPICH problem with valgrind reporting.
However, these
stacks look strange. You should have source line numbers if this is
compiled with debugging
and you should have the whole stack for MPICH.

Also, why is libquadmath being linked?

   Matt


> ==91099== Syscall param writev(vector[...]) points to uninitialised byte(s)
> ==91099==    at 0x1245FF2: writev (in
> /usr/lib/system/libsystem_kernel.dylib)
> ==91099==    by 0x101209846: MPIDU_Sock_writev (in ./ex2)
> ==91099==    by 0x101A2BA23: ???
> ==91099==    by 0x1FFFFFFFB: ???
> ==91099==    by 0x101A2BA0F: ???
> ==91099==    by 0x10852053F: ???
> ==91099==    by 0x101A24907: ???
> ==91099==    by 0x7FFF5FBFE2DF: ???
> ==91099==    by 0x1: ???
> ==91099==    by 0x10120AF13: MPIDI_CH3_iSendv (in ./ex2)
> ==91099==  Address 0x10712d0c8 is 136 bytes inside a block of size
> 1,661,792 alloc'd
> ==91099==    at 0xC713: malloc (vg_replace_malloc.c:271)
> ==91099==    by 0x100D5C6DF: superlu_malloc_dist (in ./ex2)
> ==91099==    by 0x100D23375: doubleMalloc_dist (in ./ex2)
> ==91099==    by 0x100D415C1: pdgstrs (in ./ex2)
> ==91099==    by 0x100D3F852: pdgssvx (in ./ex2)
> ==91099==    by 0x1007E5D38: MatSolve_SuperLU_DIST (in ./ex2)
> ==91099==    by 0x1002BDA1E: MatSolve (in ./ex2)
> ==91099==    by 0x1009EAF55: PCApply_LU (in ./ex2)
> ==91099==    by 0x100AAE053: PCApply (in ./ex2)
> ==91099==    by 0x100B1BCEE: KSPSolve_PREONLY (in ./ex2)
> ==91099==    by 0x100B54F55: KSPSolve (in ./ex2)
> ==91099==    by 0x1000022FC: main (in ./ex2)
>
>
>
> On 12/26/12 5:24 PM, Sanjay Govindjee wrote:
>
> I have re-configured/built using:
>
> ./configure PETSC_ARCH=gnu_shared
> -download-{spooles,parmetis,superlu_dist,prometheus,mpich,ml,hypre,metis}
> --with-shared-libraries
>
> make  PETSC_ARCH=gnu_shared all
>
> make  PETSC_ARCH=gnu_shared test
>
>
> Using the same test problem (src/ksp/ksp/examples/tutorials/ex2.c), on the
> 100x100 case I get:
>
> sg-macbook-prolocal:tutorials sg$ make PETSC_ARCH=gnu_shared runex2spooles
> Norm of error 2.21422e-12 iterations 1
> sg-macbook-prolocal:tutorials sg$ make PETSC_ARCH=gnu_shared runex2superlu
> Norm of error 7.66145e-13 iterations 1
>
> One the 500x500 case I get:
>
> sg-macbook-prolocal:tutorials sg$ make PETSC_ARCH=gnu_shared runex2spooles
> Norm of error 2.69468e-10 iterations 1
> sg-macbook-prolocal:tutorials sg$ make PETSC_ARCH=gnu_shared runex2superlu
> Norm of error 419.953 iterations 1
>
> otool shows:
>
> sg-macbook-prolocal:tutorials sg$ otool -L ex2
> ex2:
>     /Users/sg/petsc-3.3-p5/gnu_shared/lib/libpetsc.dylib (compatibility
> version 0.0.0, current version 0.0.0)
>     /usr/X11/lib/libX11.6.dylib (compatibility version 10.0.0, current
> version 10.0.0)
>     /Users/sg/petsc-3.3-p5/gnu_shared/lib/libmpichcxx.dylib (compatibility
> version 0.0.0, current version 3.0.0)
>     /usr/local/lib/libstdc++.6.dylib (compatibility version 7.0.0, current
> version 7.17.0)
>     /Users/sg/petsc-3.3-p5/gnu_shared/lib/libparmetis.dylib (compatibility
> version 0.0.0, current version 0.0.0)
>     /Users/sg/petsc-3.3-p5/gnu_shared/lib/libmetis.dylib (compatibility
> version 0.0.0, current version 0.0.0)
>
> /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libLAPACK.dylib
> (compatibility version 1.0.0, current version 1.0.0)
>
> /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
> (compatibility version 1.0.0, current version 1.0.0)
>     /Users/sg/petsc-3.3-p5/gnu_shared/lib/libmpichf90.dylib (compatibility
> version 0.0.0, current version 3.0.0)
>     /usr/local/lib/libgfortran.3.dylib (compatibility version 4.0.0,
> current version 4.0.0)
>     /usr/local/lib/libquadmath.0.dylib (compatibility version 1.0.0,
> current version 1.0.0)
>     /Users/sg/petsc-3.3-p5/gnu_shared/lib/libpmpich.dylib (compatibility
> version 0.0.0, current version 3.0.0)
>     /Users/sg/petsc-3.3-p5/gnu_shared/lib/libmpich.dylib (compatibility
> version 0.0.0, current version 3.0.0)
>     /Users/sg/petsc-3.3-p5/gnu_shared/lib/libopa.1.dylib (compatibility
> version 2.0.0, current version 2.0.0)
>     /Users/sg/petsc-3.3-p5/gnu_shared/lib/libmpl.1.dylib (compatibility
> version 3.0.0, current version 3.0.0)
>     /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current
> version 159.1.0)
>     /usr/local/lib/libgcc_s.1.dylib (compatibility version 1.0.0, current
> version 1.0.0)
>
>
>
>
> On 12/26/12 3:08 PM, Matthew Knepley wrote:
>
>
> On Wed, Dec 26, 2012 at 4:38 PM, Sanjay Govindjee <s_g at berkeley.edu>wrote:
>
>> I have a macbook pro (Mac OS X 10.7.5)
>>
>> % uname -a
>> Darwin sg-macbook-prolocal.local 11.4.2 Darwin Kernel Version 11.4.2: Thu
>> Aug 23 16:25:48 PDT 2012; root:xnu-1699.32.7~1/RELEASE_X86_64 x86_64
>>
>> I configured using:
>>
>>
>> ./configure --with-cc=icc --with-fc=ifort
>> -download-{spooles,parmetis,superlu_dist,prometheus,mpich,ml,hypre,metis}
>>
>>  so everything was built together.
>
>
>  Since
>
>    a) you have tried other compilers
>
>    b) we cannot reproduce it
>
>    c) we are building the library during configure
>
>  I would guess that some outside library, in your default link path, is
> contaminating
> the executable with symbols which override some of those in SuperLU. The
> SuperLU
> people are not super careful about naming. Could you
>
>    1) Try this same exercise using --with-shared-libraries
>
>    2) Once you do that, use otool -L on the executable so we can see
> where everything comes from
>
>    Thanks,
>
>        Matt
>
>
>>  On 12/26/12 1:34 PM, Hong Zhang wrote:
>>
>>> Sanjay:
>>>
>>>> hmmm....I guess that is good news -- in that superlu is not broken.
>>>> However,
>>>> for me
>>>> not so good news since I seems that there is nasty bug lurking on my
>>>> machine.
>>>>
>>>> Any suggestions on chasing down the error?
>>>>
>>> How did you install your supelu_dist with petsc-3.3?
>>> What machine do you use?
>>>
>>> Hong
>>>
>>>>
>>>> On 12/26/12 1:23 PM, Hong Zhang wrote:
>>>>
>>>>> Sanjay:
>>>>> I get
>>>>> petsc-3.3/src/ksp/ksp/examples/tutorials>mpiexec -n 2 ./ex2
>>>>> -ksp_monitor_short  -ksp_type preonly -pc_type lu
>>>>> -pc_factor_mat_solver_package superlu_dist -m 500 -n 500
>>>>> Norm of error 1.92279e-11 iterations 1
>>>>>
>>>>> Hong
>>>>>
>>>>>  I have done some more testing of the problem, continuing with
>>>>>> src/ksp/ksp/examples/tutorials/ex2.c.
>>>>>>
>>>>>> The behavior I am seeing is that with smaller problems sizes
>>>>>> superlu_dist
>>>>>> is
>>>>>> behaving properly
>>>>>> but with larger problem sizes things seem to go wrong and what goes
>>>>>> wrong
>>>>>> is
>>>>>> apparently consistent; the error appears both with my intel build as
>>>>>> well
>>>>>> as
>>>>>> with my gcc build.
>>>>>>
>>>>>> I have two run lines:
>>>>>>
>>>>>> runex2superlu:
>>>>>>           -@${MPIEXEC} -n 2 ./ex2 -ksp_monitor_short -m 100 -n 100
>>>>>> -ksp_type
>>>>>> preonly -pc_type lu -pc_factor_mat_solver_package superlu_dist
>>>>>>
>>>>>> runex2spooles:
>>>>>>           -@${MPIEXEC} -n 2 ./ex2 -ksp_monitor_short -m 100 -n 100
>>>>>> -ksp_type
>>>>>> preonly -pc_type lu -pc_factor_mat_solver_package spooles
>>>>>>
>>>>>>   From my intel build, I get
>>>>>>
>>>>>> sg-macbook-prolocal:tutorials sg$ make runex2superlu
>>>>>> Norm of error 7.66145e-13 iterations 1
>>>>>> sg-macbook-prolocal:tutorials sg$ make runex2spooles
>>>>>> Norm of error 2.21422e-12 iterations 1
>>>>>>
>>>>>>   From my GCC build, I get
>>>>>> sg-macbook-prolocal:tutorials sg$ make runex2superlu
>>>>>> Norm of error 7.66145e-13 iterations 1
>>>>>> sg-macbook-prolocal:tutorials sg$ make runex2spooles
>>>>>> Norm of error 2.21422e-12 iterations 1
>>>>>>
>>>>>> If I change the -m 100 -n 100 to -m 500 -n 500, I get for my intel
>>>>>> build
>>>>>>
>>>>>> sg-macbook-prolocal:tutorials sg$ make runex2superlu
>>>>>> Norm of error 419.953 iterations 1
>>>>>> sg-macbook-prolocal:tutorials sg$ make runex2spooles
>>>>>> Norm of error 2.69468e-10 iterations 1
>>>>>>
>>>>>>   From my GCC build with -m 500 -n 500, I get
>>>>>>
>>>>>> sg-macbook-prolocal:tutorials sg$ make runex2superlu
>>>>>> Norm of error 419.953 iterations 1
>>>>>> sg-macbook-prolocal:tutorials sg$ make runex2spooles
>>>>>> Norm of error 2.69468e-10 iterations 1
>>>>>>
>>>>>>
>>>>>> Any suggestions will be greatly appreciated.
>>>>>>
>>>>>> -sanjay
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 12/23/12 6:42 PM, Matthew Knepley wrote:
>>>>>>
>>>>>>
>>>>>> On Sun, Dec 23, 2012 at 9:37 PM, Sanjay Govindjee <s_g at berkeley.edu>
>>>>>> wrote:
>>>>>>
>>>>>>> I decided to go with ksp/ksp/exampeles/tutorials/ex2.c; I was unsure
>>>>>>> how
>>>>>>> to convert the run lines for snes/examples/ex5.c to work with a
>>>>>>> direct
>>>>>>> solver as I am not versed in SNES options.
>>>>>>>
>>>>>>> Notwithstanding something strange is happening only on select
>>>>>>> examples.
>>>>>>> With ksp/ksp/exampeles/tutorials/ex2.c and the run line:
>>>>>>>
>>>>>>> -@${MPIEXEC} -n 2 ./ex2 -ksp_monitor_short -m 20 -n 20 -ksp_type
>>>>>>> preonly
>>>>>>> -pc_type lu -pc_factor_mat_solver_package superlu_dist
>>>>>>>
>>>>>>> I get good results (of the order):
>>>>>>>
>>>>>>> Norm of error 1.85464e-14 iterations 1
>>>>>>>
>>>>>>> using both superlu_dist and spooles.
>>>>>>>
>>>>>>> My BLAS/LAPACK: -llapack -lblas (so native to my machine).
>>>>>>>
>>>>>>> If you can guide me on a run line for the snes ex5.c I can try that
>>>>>>> too.
>>>>>>> I'll also try to construct a GCC build later to see if that is an
>>>>>>> issue.
>>>>>>>
>>>>>>
>>>>>> Same line on ex5, but ex2 is good enough. However, it will not tell us
>>>>>> anything new. Try another build.
>>>>>>
>>>>>>      Matt
>>>>>>
>>>>>>  -sanjay
>>>>>>>
>>>>>>>
>>>>>>> On 12/23/12 5:58 PM, Matthew Knepley wrote:
>>>>>>>
>>>>>>> On Sun, Dec 23, 2012 at 8:08 PM, Sanjay Govindjee <s_g at berkeley.edu>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Not sure what you mean by where is your matrix?  I am simply running
>>>>>>>> ex6
>>>>>>>> in the ksp/examples/tests directory.
>>>>>>>>
>>>>>>>> The reason I ran this test is because I was seeing the same behavior
>>>>>>>> with
>>>>>>>> my finite element code (on perfectly benign problems).
>>>>>>>>
>>>>>>>> Is there a built-in test that you use to check that superlu_dist is
>>>>>>>> working properly with petsc?
>>>>>>>> i.e. something you know that works with with petsc 3.3-p5?
>>>>>>>>
>>>>>>>
>>>>>>> 1) Run it on a SNES ex5 (or KSP ex2), which is a nice Laplacian
>>>>>>>
>>>>>>> 2) Compare with MUMPS
>>>>>>>
>>>>>>>      Matt
>>>>>>>
>>>>>>>  -sanjay
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 12/23/12 4:56 PM, Jed Brown wrote:
>>>>>>>>
>>>>>>>> Where is your matrix? It might be ending up with a very bad pivot.
>>>>>>>> If
>>>>>>>> the
>>>>>>>> problem can be reproduced, it should be reported to the SuperLU_DIST
>>>>>>>> developers to fix. (Note that we do not see this with other
>>>>>>>> matrices.)
>>>>>>>> You
>>>>>>>> can also try MUMPS.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sun, Dec 23, 2012 at 6:48 PM, Sanjay Govindjee <s_g at berkeley.edu
>>>>>>>> >
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I wanted to use SuperLU Dist to perform a direct solve but seem to
>>>>>>>>> be
>>>>>>>>> encountering
>>>>>>>>> a problem.  I was wonder if this is a know issue and if there is a
>>>>>>>>> solution for it.
>>>>>>>>>
>>>>>>>>> The problem is easily observed using ex6.c in
>>>>>>>>> src/ksp/ksp/examples/tests.
>>>>>>>>>
>>>>>>>>> Out of the box: make runex6 produces a residual error of O(1e-11),
>>>>>>>>> all
>>>>>>>>> is well.
>>>>>>>>>
>>>>>>>>> I then changed the run to run on two processors and add the flag
>>>>>>>>> -pc_factor_mat_solver_package spooles  this produces a residual
>>>>>>>>> error
>>>>>>>>> of
>>>>>>>>> O(1e-11), all is still well.
>>>>>>>>>
>>>>>>>>> I then switch over to -pc_factor_mat_solver_package superlu_dist
>>>>>>>>> and
>>>>>>>>> the
>>>>>>>>> residual error comes back as 22.6637!  Something seems very wrong.
>>>>>>>>>
>>>>>>>>> My build is perfectly vanilla:
>>>>>>>>>
>>>>>>>>> export PETSC_DIR=/Users/sg/petsc-3.3-p5/
>>>>>>>>> export PETSC_ARCH=intel
>>>>>>>>>
>>>>>>>>> ./configure --with-cc=icc --with-fc=ifort  \
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -download-{spooles,parmetis,superlu_dist,prometheus,mpich,ml,hypre,metis}
>>>>>>>>>
>>>>>>>>> make PETSC_DIR=/Users/sg/petsc-3.3-p5/ PETSC_ARCH=intel all
>>>>>>>>> make PETSC_DIR=/Users/sg/petsc-3.3-p5/ PETSC_ARCH=intel test
>>>>>>>>>
>>>>>>>>> -sanjay
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> -----------------------------------------------
>>>>>>>> Sanjay Govindjee, PhD, PE
>>>>>>>> Professor of Civil Engineering
>>>>>>>> Vice Chair for Academic Affairs
>>>>>>>>
>>>>>>>> 779 Davis Hall
>>>>>>>> Structural Engineering, Mechanics and Materials
>>>>>>>> Department of Civil Engineering
>>>>>>>> University of California
>>>>>>>> Berkeley, CA 94720-1710
>>>>>>>>
>>>>>>>> Voice:  +1 510 642 6060 <%2B1%20510%20642%206060>
>>>>>>>> FAX:    +1 510 643 5264 <%2B1%20510%20643%205264>
>>>>>>>> s_g at berkeley.edu
>>>>>>>> http://www.ce.berkeley.edu/~sanjay
>>>>>>>> -----------------------------------------------
>>>>>>>>
>>>>>>>> New Books:
>>>>>>>>
>>>>>>>> Engineering Mechanics of Deformable
>>>>>>>> Solids: A Presentation with Exercises
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641
>>>>>>>> http://ukcatalogue.oup.com/product/9780199651641.do
>>>>>>>> http://amzn.com/0199651647
>>>>>>>>
>>>>>>>>
>>>>>>>> Engineering Mechanics 3 (Dynamics)
>>>>>>>> http://www.springer.com/materials/mechanics/book/978-3-642-14018-1
>>>>>>>> http://amzn.com/3642140181
>>>>>>>>
>>>>>>>> -----------------------------------------------
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> What most experimenters take for granted before they begin their
>>>>>>> experiments is infinitely more interesting than any results to which
>>>>>>> their
>>>>>>> experiments lead.
>>>>>>> -- Norbert Wiener
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> --
>>>>>> What most experimenters take for granted before they begin their
>>>>>> experiments
>>>>>> is infinitely more interesting than any results to which their
>>>>>> experiments
>>>>>> lead.
>>>>>> -- Norbert Wiener
>>>>>>
>>>>>>
>>>>>>  --
>>>> -----------------------------------------------
>>>> Sanjay Govindjee, PhD, PE
>>>> Professor of Civil Engineering
>>>> Vice Chair for Academic Affairs
>>>>
>>>> 779 Davis Hall
>>>> Structural Engineering, Mechanics and Materials
>>>> Department of Civil Engineering
>>>> University of California
>>>> Berkeley, CA 94720-1710
>>>>
>>>> Voice:  +1 510 642 6060 <%2B1%20510%20642%206060>
>>>> FAX:    +1 510 643 5264 <%2B1%20510%20643%205264>
>>>> s_g at berkeley.edu
>>>> http://www.ce.berkeley.edu/~sanjay
>>>> -----------------------------------------------
>>>>
>>>> New Books:
>>>>
>>>> Engineering Mechanics of Deformable
>>>> Solids: A Presentation with Exercises
>>>>
>>>> http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641
>>>> http://ukcatalogue.oup.com/product/9780199651641.do
>>>> http://amzn.com/0199651647
>>>>
>>>>
>>>> Engineering Mechanics 3 (Dynamics)
>>>> http://www.springer.com/materials/mechanics/book/978-3-642-14018-1
>>>> http://amzn.com/3642140181
>>>>
>>>> -----------------------------------------------
>>>>
>>>>
>> --
>> -----------------------------------------------
>> Sanjay Govindjee, PhD, PE
>> Professor of Civil Engineering
>> Vice Chair for Academic Affairs
>>
>> 779 Davis Hall
>> Structural Engineering, Mechanics and Materials
>> Department of Civil Engineering
>> University of California
>> Berkeley, CA 94720-1710
>>
>> Voice:  +1 510 642 6060 <%2B1%20510%20642%206060>
>> FAX:    +1 510 643 5264 <%2B1%20510%20643%205264>
>> s_g at berkeley.edu
>> http://www.ce.berkeley.edu/~sanjay
>> -----------------------------------------------
>>
>> New Books:
>>
>> Engineering Mechanics of Deformable
>> Solids: A Presentation with Exercises
>>
>> http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641
>> http://ukcatalogue.oup.com/product/9780199651641.do
>> http://amzn.com/0199651647
>>
>>
>> Engineering Mechanics 3 (Dynamics)
>> http://www.springer.com/materials/mechanics/book/978-3-642-14018-1
>> http://amzn.com/3642140181
>>
>> -----------------------------------------------
>>
>>
>
>
>  --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
>
> --
> -----------------------------------------------
> Sanjay Govindjee, PhD, PE
> Professor of Civil Engineering
> Vice Chair for Academic Affairs
>
> 779 Davis Hall
> Structural Engineering, Mechanics and Materials
> Department of Civil Engineering
> University of California
> Berkeley, CA 94720-1710
>
> Voice:  +1 510 642 6060
> FAX:    +1 510 643 5264s_g at berkeley.eduhttp://www.ce.berkeley.edu/~sanjay
> -----------------------------------------------
>
> New Books:
>
> Engineering Mechanics of Deformable
> Solids: A Presentation with Exerciseshttp://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641http://ukcatalogue.oup.com/product/9780199651641.dohttp://amzn.com/0199651647
>
>
> Engineering Mechanics 3 (Dynamics)http://www.springer.com/materials/mechanics/book/978-3-642-14018-1http://amzn.com/3642140181
>
> -----------------------------------------------
>
>
> --
> -----------------------------------------------
> Sanjay Govindjee, PhD, PE
> Professor of Civil Engineering
> Vice Chair for Academic Affairs
>
> 779 Davis Hall
> Structural Engineering, Mechanics and Materials
> Department of Civil Engineering
> University of California
> Berkeley, CA 94720-1710
>
> Voice:  +1 510 642 6060
> FAX:    +1 510 643 5264s_g at berkeley.eduhttp://www.ce.berkeley.edu/~sanjay
> -----------------------------------------------
>
> New Books:
>
> Engineering Mechanics of Deformable
> Solids: A Presentation with Exerciseshttp://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641http://ukcatalogue.oup.com/product/9780199651641.dohttp://amzn.com/0199651647
>
>
> Engineering Mechanics 3 (Dynamics)http://www.springer.com/materials/mechanics/book/978-3-642-14018-1http://amzn.com/3642140181
>
> -----------------------------------------------
>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121226/ab3a1aed/attachment-0001.html>

From s_g at berkeley.edu  Wed Dec 26 22:52:41 2012
From: s_g at berkeley.edu (Sanjay Govindjee)
Date: Wed, 26 Dec 2012 20:52:41 -0800
Subject: [petsc-users] Using superlu_dist in a direct solve
In-Reply-To: <CAMYG4GmMiZqkeCS-X-KB2nkZD=8UWhBZ90TkbWhHbYpTA7jXrg@mail.gmail.com>
References: <50D7A664.6080802@berkeley.edu>
	<CAM9tzSnY9o6EckTmo_GkD6QxHjj3o54ragfJotbacPkr8fKfmw@mail.gmail.com>
	<50D7AB15.5040606@berkeley.edu>
	<CAMYG4GmuhF7Jh7os3j9bLubuoMGnerxtN+tU=gzpeYmX+3UUXA@mail.gmail.com>
	<50D7BFF3.3030909@berkeley.edu>
	<CAMYG4GmW7FfEqvV6YEPygXFjpxjD2sDA9fkroAJ+DXXbjiW9nw@mail.gmail.com>
	<50DB6892.5040402@berkeley.edu>
	<CAGCphBubYXDONH97Pf+FGvXzHEyyJqZaZpx0JbKoBL5_ypZiWw@mail.gmail.com>
	<50DB6C05.4090006@berkeley.edu>
	<CAGCphBtf2Q3d2VS_DYdxvTY5iaFqJFZAKkkHw2OmhrYyP7=hdw@mail.gmail.com>
	<50DB6E50.3050001@berkeley.edu>
	<CAMYG4Gmn8uwypwzoGRmjaOhDRL7Mbkdqn6ii5VORh_wxok2PrA@mail.gmail.com>
	<50DBA349.7030307@berkeley.edu> <50DBA5C0.4050807@berkeley.edu>
	<CAMYG4GmMiZqkeCS-X-KB2nkZD=8UWhBZ90TkbWhHbYpTA7jXrg@mail.gmail.com>
Message-ID: <50DBD419.6070204@berkeley.edu>

-g is definitely on.  I'll send the configure.log file to the PETSc 
maintenance e-mail, petsc-maint at mcs.anl.gov 
<mailto:petsc-maint at mcs.anl.gov> .  -sanjay

On 12/26/12 6:46 PM, Matthew Knepley wrote:
> On Wed, Dec 26, 2012 at 8:34 PM, Sanjay Govindjee <s_g at berkeley.edu 
> <mailto:s_g at berkeley.edu>> wrote:
>
>     For what it is worth.  I ran the problems with valgrind (before I
>     built the --with-shared-libraries version).
>     With spooles the run is essentially clean.  With superlu I see
>     lots of errors of the type:
>
>
> This looks like a well-known MPICH problem with valgrind reporting. 
> However, these
> stacks look strange. You should have source line numbers if this is 
> compiled with debugging
> and you should have the whole stack for MPICH.
>
> Also, why is libquadmath being linked?
>
>    Matt
>
>     ==91099== Syscall param writev(vector[...]) points to
>     uninitialised byte(s)
>     ==91099==    at 0x1245FF2: writev (in
>     /usr/lib/system/libsystem_kernel.dylib)
>     ==91099==    by 0x101209846: MPIDU_Sock_writev (in ./ex2)
>     ==91099==    by 0x101A2BA23: ???
>     ==91099==    by 0x1FFFFFFFB: ???
>     ==91099==    by 0x101A2BA0F: ???
>     ==91099==    by 0x10852053F: ???
>     ==91099==    by 0x101A24907: ???
>     ==91099==    by 0x7FFF5FBFE2DF: ???
>     ==91099==    by 0x1: ???
>     ==91099==    by 0x10120AF13: MPIDI_CH3_iSendv (in ./ex2)
>     ==91099==  Address 0x10712d0c8 is 136 bytes inside a block of size
>     1,661,792 alloc'd
>     ==91099==    at 0xC713: malloc (vg_replace_malloc.c:271)
>     ==91099==    by 0x100D5C6DF: superlu_malloc_dist (in ./ex2)
>     ==91099==    by 0x100D23375: doubleMalloc_dist (in ./ex2)
>     ==91099==    by 0x100D415C1: pdgstrs (in ./ex2)
>     ==91099==    by 0x100D3F852: pdgssvx (in ./ex2)
>     ==91099==    by 0x1007E5D38: MatSolve_SuperLU_DIST (in ./ex2)
>     ==91099==    by 0x1002BDA1E: MatSolve (in ./ex2)
>     ==91099==    by 0x1009EAF55: PCApply_LU (in ./ex2)
>     ==91099==    by 0x100AAE053: PCApply (in ./ex2)
>     ==91099==    by 0x100B1BCEE: KSPSolve_PREONLY (in ./ex2)
>     ==91099==    by 0x100B54F55: KSPSolve (in ./ex2)
>     ==91099==    by 0x1000022FC: main (in ./ex2)
>
>
>
>     On 12/26/12 5:24 PM, Sanjay Govindjee wrote:
>>     I have re-configured/built using:
>>
>>     ./configure PETSC_ARCH=gnu_shared
>>     -download-{spooles,parmetis,superlu_dist,prometheus,mpich,ml,hypre,metis}
>>     --with-shared-libraries
>>
>>     make  PETSC_ARCH=gnu_shared all
>>
>>     make  PETSC_ARCH=gnu_shared test
>>
>>
>>     Using the same test problem
>>     (src/ksp/ksp/examples/tutorials/ex2.c), on the 100x100 case I get:
>>
>>     sg-macbook-prolocal:tutorials sg$ make PETSC_ARCH=gnu_shared
>>     runex2spooles
>>     Norm of error 2.21422e-12 iterations 1
>>     sg-macbook-prolocal:tutorials sg$ make PETSC_ARCH=gnu_shared
>>     runex2superlu
>>     Norm of error 7.66145e-13 iterations 1
>>
>>     One the 500x500 case I get:
>>
>>     sg-macbook-prolocal:tutorials sg$ make PETSC_ARCH=gnu_shared
>>     runex2spooles
>>     Norm of error 2.69468e-10 iterations 1
>>     sg-macbook-prolocal:tutorials sg$ make PETSC_ARCH=gnu_shared
>>     runex2superlu
>>     Norm of error 419.953 iterations 1
>>
>>     otool shows:
>>
>>     sg-macbook-prolocal:tutorials sg$ otool -L ex2
>>     ex2:
>>     /Users/sg/petsc-3.3-p5/gnu_shared/lib/libpetsc.dylib
>>     (compatibility version 0.0.0, current version 0.0.0)
>>         /usr/X11/lib/libX11.6.dylib (compatibility version 10.0.0,
>>     current version 10.0.0)
>>     /Users/sg/petsc-3.3-p5/gnu_shared/lib/libmpichcxx.dylib
>>     (compatibility version 0.0.0, current version 3.0.0)
>>         /usr/local/lib/libstdc++.6.dylib (compatibility version
>>     7.0.0, current version 7.17.0)
>>     /Users/sg/petsc-3.3-p5/gnu_shared/lib/libparmetis.dylib
>>     (compatibility version 0.0.0, current version 0.0.0)
>>     /Users/sg/petsc-3.3-p5/gnu_shared/lib/libmetis.dylib
>>     (compatibility version 0.0.0, current version 0.0.0)
>>     /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libLAPACK.dylib
>>     (compatibility version 1.0.0, current version 1.0.0)
>>     /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
>>     (compatibility version 1.0.0, current version 1.0.0)
>>     /Users/sg/petsc-3.3-p5/gnu_shared/lib/libmpichf90.dylib
>>     (compatibility version 0.0.0, current version 3.0.0)
>>         /usr/local/lib/libgfortran.3.dylib (compatibility version
>>     4.0.0, current version 4.0.0)
>>         /usr/local/lib/libquadmath.0.dylib (compatibility version
>>     1.0.0, current version 1.0.0)
>>     /Users/sg/petsc-3.3-p5/gnu_shared/lib/libpmpich.dylib
>>     (compatibility version 0.0.0, current version 3.0.0)
>>     /Users/sg/petsc-3.3-p5/gnu_shared/lib/libmpich.dylib
>>     (compatibility version 0.0.0, current version 3.0.0)
>>     /Users/sg/petsc-3.3-p5/gnu_shared/lib/libopa.1.dylib
>>     (compatibility version 2.0.0, current version 2.0.0)
>>     /Users/sg/petsc-3.3-p5/gnu_shared/lib/libmpl.1.dylib
>>     (compatibility version 3.0.0, current version 3.0.0)
>>         /usr/lib/libSystem.B.dylib (compatibility version 1.0.0,
>>     current version 159.1.0)
>>         /usr/local/lib/libgcc_s.1.dylib (compatibility version 1.0.0,
>>     current version 1.0.0)
>>
>>
>>
>>
>>     On 12/26/12 3:08 PM, Matthew Knepley wrote:
>>>
>>>     On Wed, Dec 26, 2012 at 4:38 PM, Sanjay Govindjee
>>>     <s_g at berkeley.edu <mailto:s_g at berkeley.edu>> wrote:
>>>
>>>         I have a macbook pro (Mac OS X 10.7.5)
>>>
>>>         % uname -a
>>>         Darwin sg-macbook-prolocal.local 11.4.2 Darwin Kernel
>>>         Version 11.4.2: Thu Aug 23 16:25:48 PDT 2012;
>>>         root:xnu-1699.32.7~1/RELEASE_X86_64 x86_64
>>>
>>>         I configured using:
>>>
>>>
>>>         ./configure --with-cc=icc --with-fc=ifort
>>>         -download-{spooles,parmetis,superlu_dist,prometheus,mpich,ml,hypre,metis}
>>>
>>>         so everything was built together.
>>>
>>>
>>>     Since
>>>
>>>       a) you have tried other compilers
>>>
>>>       b) we cannot reproduce it
>>>
>>>       c) we are building the library during configure
>>>
>>>     I would guess that some outside library, in your default link
>>>     path, is contaminating
>>>     the executable with symbols which override some of those in
>>>     SuperLU. The SuperLU
>>>     people are not super careful about naming. Could you
>>>
>>>       1) Try this same exercise using --with-shared-libraries
>>>
>>>       2) Once you do that, use otool -L on the executable so we can
>>>     see where everything comes from
>>>
>>>       Thanks,
>>>
>>>           Matt
>>>
>>>         On 12/26/12 1:34 PM, Hong Zhang wrote:
>>>
>>>             Sanjay:
>>>
>>>                 hmmm....I guess that is good news -- in that superlu
>>>                 is not broken. However,
>>>                 for me
>>>                 not so good news since I seems that there is nasty
>>>                 bug lurking on my
>>>                 machine.
>>>
>>>                 Any suggestions on chasing down the error?
>>>
>>>             How did you install your supelu_dist with petsc-3.3?
>>>             What machine do you use?
>>>
>>>             Hong
>>>
>>>
>>>                 On 12/26/12 1:23 PM, Hong Zhang wrote:
>>>
>>>                     Sanjay:
>>>                     I get
>>>                     petsc-3.3/src/ksp/ksp/examples/tutorials>mpiexec
>>>                     -n 2 ./ex2
>>>                     -ksp_monitor_short  -ksp_type preonly -pc_type lu
>>>                     -pc_factor_mat_solver_package superlu_dist -m
>>>                     500 -n 500
>>>                     Norm of error 1.92279e-11 iterations 1
>>>
>>>                     Hong
>>>
>>>                         I have done some more testing of the
>>>                         problem, continuing with
>>>                         src/ksp/ksp/examples/tutorials/ex2.c.
>>>
>>>                         The behavior I am seeing is that with
>>>                         smaller problems sizes superlu_dist
>>>                         is
>>>                         behaving properly
>>>                         but with larger problem sizes things seem to
>>>                         go wrong and what goes wrong
>>>                         is
>>>                         apparently consistent; the error appears
>>>                         both with my intel build as well
>>>                         as
>>>                         with my gcc build.
>>>
>>>                         I have two run lines:
>>>
>>>                         runex2superlu:
>>>                                   -@${MPIEXEC} -n 2 ./ex2
>>>                         -ksp_monitor_short -m 100 -n 100
>>>                         -ksp_type
>>>                         preonly -pc_type lu
>>>                         -pc_factor_mat_solver_package superlu_dist
>>>
>>>                         runex2spooles:
>>>                                   -@${MPIEXEC} -n 2 ./ex2
>>>                         -ksp_monitor_short -m 100 -n 100
>>>                         -ksp_type
>>>                         preonly -pc_type lu
>>>                         -pc_factor_mat_solver_package spooles
>>>
>>>                           From my intel build, I get
>>>
>>>                         sg-macbook-prolocal:tutorials sg$ make
>>>                         runex2superlu
>>>                         Norm of error 7.66145e-13 iterations 1
>>>                         sg-macbook-prolocal:tutorials sg$ make
>>>                         runex2spooles
>>>                         Norm of error 2.21422e-12 iterations 1
>>>
>>>                           From my GCC build, I get
>>>                         sg-macbook-prolocal:tutorials sg$ make
>>>                         runex2superlu
>>>                         Norm of error 7.66145e-13 iterations 1
>>>                         sg-macbook-prolocal:tutorials sg$ make
>>>                         runex2spooles
>>>                         Norm of error 2.21422e-12 iterations 1
>>>
>>>                         If I change the -m 100 -n 100 to -m 500 -n
>>>                         500, I get for my intel build
>>>
>>>                         sg-macbook-prolocal:tutorials sg$ make
>>>                         runex2superlu
>>>                         Norm of error 419.953 iterations 1
>>>                         sg-macbook-prolocal:tutorials sg$ make
>>>                         runex2spooles
>>>                         Norm of error 2.69468e-10 iterations 1
>>>
>>>                           From my GCC build with -m 500 -n 500, I get
>>>
>>>                         sg-macbook-prolocal:tutorials sg$ make
>>>                         runex2superlu
>>>                         Norm of error 419.953 iterations 1
>>>                         sg-macbook-prolocal:tutorials sg$ make
>>>                         runex2spooles
>>>                         Norm of error 2.69468e-10 iterations 1
>>>
>>>
>>>                         Any suggestions will be greatly appreciated.
>>>
>>>                         -sanjay
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>                         On 12/23/12 6:42 PM, Matthew Knepley wrote:
>>>
>>>
>>>                         On Sun, Dec 23, 2012 at 9:37 PM, Sanjay
>>>                         Govindjee <s_g at berkeley.edu
>>>                         <mailto:s_g at berkeley.edu>>
>>>                         wrote:
>>>
>>>                             I decided to go with
>>>                             ksp/ksp/exampeles/tutorials/ex2.c; I was
>>>                             unsure how
>>>                             to convert the run lines for
>>>                             snes/examples/ex5.c to work with a direct
>>>                             solver as I am not versed in SNES options.
>>>
>>>                             Notwithstanding something strange is
>>>                             happening only on select examples.
>>>                             With ksp/ksp/exampeles/tutorials/ex2.c
>>>                             and the run line:
>>>
>>>                             -@${MPIEXEC} -n 2 ./ex2
>>>                             -ksp_monitor_short -m 20 -n 20 -ksp_type
>>>                             preonly
>>>                             -pc_type lu
>>>                             -pc_factor_mat_solver_package superlu_dist
>>>
>>>                             I get good results (of the order):
>>>
>>>                             Norm of error 1.85464e-14 iterations 1
>>>
>>>                             using both superlu_dist and spooles.
>>>
>>>                             My BLAS/LAPACK: -llapack -lblas (so
>>>                             native to my machine).
>>>
>>>                             If you can guide me on a run line for
>>>                             the snes ex5.c I can try that too.
>>>                             I'll also try to construct a GCC build
>>>                             later to see if that is an issue.
>>>
>>>
>>>                         Same line on ex5, but ex2 is good enough.
>>>                         However, it will not tell us
>>>                         anything new. Try another build.
>>>
>>>                              Matt
>>>
>>>                             -sanjay
>>>
>>>
>>>                             On 12/23/12 5:58 PM, Matthew Knepley wrote:
>>>
>>>                             On Sun, Dec 23, 2012 at 8:08 PM, Sanjay
>>>                             Govindjee <s_g at berkeley.edu
>>>                             <mailto:s_g at berkeley.edu>>
>>>                             wrote:
>>>
>>>                                 Not sure what you mean by where is
>>>                                 your matrix?  I am simply running
>>>                                 ex6
>>>                                 in the ksp/examples/tests directory.
>>>
>>>                                 The reason I ran this test is
>>>                                 because I was seeing the same behavior
>>>                                 with
>>>                                 my finite element code (on perfectly
>>>                                 benign problems).
>>>
>>>                                 Is there a built-in test that you
>>>                                 use to check that superlu_dist is
>>>                                 working properly with petsc?
>>>                                 i.e. something you know that works
>>>                                 with with petsc 3.3-p5?
>>>
>>>
>>>                             1) Run it on a SNES ex5 (or KSP ex2),
>>>                             which is a nice Laplacian
>>>
>>>                             2) Compare with MUMPS
>>>
>>>                                  Matt
>>>
>>>                                 -sanjay
>>>
>>>
>>>
>>>                                 On 12/23/12 4:56 PM, Jed Brown wrote:
>>>
>>>                                 Where is your matrix? It might be
>>>                                 ending up with a very bad pivot. If
>>>                                 the
>>>                                 problem can be reproduced, it should
>>>                                 be reported to the SuperLU_DIST
>>>                                 developers to fix. (Note that we do
>>>                                 not see this with other matrices.)
>>>                                 You
>>>                                 can also try MUMPS.
>>>
>>>
>>>                                 On Sun, Dec 23, 2012 at 6:48 PM,
>>>                                 Sanjay Govindjee <s_g at berkeley.edu
>>>                                 <mailto:s_g at berkeley.edu>>
>>>                                 wrote:
>>>
>>>                                     I wanted to use SuperLU Dist to
>>>                                     perform a direct solve but seem
>>>                                     to be
>>>                                     encountering
>>>                                     a problem.  I was wonder if this
>>>                                     is a know issue and if there is a
>>>                                     solution for it.
>>>
>>>                                     The problem is easily observed
>>>                                     using ex6.c in
>>>                                     src/ksp/ksp/examples/tests.
>>>
>>>                                     Out of the box: make runex6
>>>                                     produces a residual error of
>>>                                     O(1e-11), all
>>>                                     is well.
>>>
>>>                                     I then changed the run to run on
>>>                                     two processors and add the flag
>>>                                     -pc_factor_mat_solver_package
>>>                                     spooles  this produces a
>>>                                     residual error
>>>                                     of
>>>                                     O(1e-11), all is still well.
>>>
>>>                                     I then switch over to
>>>                                     -pc_factor_mat_solver_package
>>>                                     superlu_dist and
>>>                                     the
>>>                                     residual error comes back as
>>>                                     22.6637!  Something seems very
>>>                                     wrong.
>>>
>>>                                     My build is perfectly vanilla:
>>>
>>>                                     export
>>>                                     PETSC_DIR=/Users/sg/petsc-3.3-p5/
>>>                                     export PETSC_ARCH=intel
>>>
>>>                                     ./configure --with-cc=icc
>>>                                     --with-fc=ifort  \
>>>
>>>                                     -download-{spooles,parmetis,superlu_dist,prometheus,mpich,ml,hypre,metis}
>>>
>>>                                     make
>>>                                     PETSC_DIR=/Users/sg/petsc-3.3-p5/ PETSC_ARCH=intel
>>>                                     all
>>>                                     make
>>>                                     PETSC_DIR=/Users/sg/petsc-3.3-p5/ PETSC_ARCH=intel
>>>                                     test
>>>
>>>                                     -sanjay
>>>
>>>
>>>
>>>                                 --
>>>                                 -----------------------------------------------
>>>                                 Sanjay Govindjee, PhD, PE
>>>                                 Professor of Civil Engineering
>>>                                 Vice Chair for Academic Affairs
>>>
>>>                                 779 Davis Hall
>>>                                 Structural Engineering, Mechanics
>>>                                 and Materials
>>>                                 Department of Civil Engineering
>>>                                 University of California
>>>                                 Berkeley, CA 94720-1710
>>>
>>>                                 Voice: +1 510 642 6060
>>>                                 <tel:%2B1%20510%20642%206060>
>>>                                 FAX: +1 510 643 5264
>>>                                 <tel:%2B1%20510%20643%205264>
>>>                                 s_g at berkeley.edu
>>>                                 <mailto:s_g at berkeley.edu>
>>>                                 http://www.ce.berkeley.edu/~sanjay
>>>                                 <http://www.ce.berkeley.edu/%7Esanjay>
>>>                                 -----------------------------------------------
>>>
>>>                                 New Books:
>>>
>>>                                 Engineering Mechanics of Deformable
>>>                                 Solids: A Presentation with Exercises
>>>
>>>
>>>                                 http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641
>>>                                 http://ukcatalogue.oup.com/product/9780199651641.do
>>>                                 http://amzn.com/0199651647
>>>
>>>
>>>                                 Engineering Mechanics 3 (Dynamics)
>>>                                 http://www.springer.com/materials/mechanics/book/978-3-642-14018-1
>>>                                 http://amzn.com/3642140181
>>>
>>>                                 -----------------------------------------------
>>>
>>>
>>>
>>>
>>>                             --
>>>                             What most experimenters take for granted
>>>                             before they begin their
>>>                             experiments is infinitely more
>>>                             interesting than any results to which
>>>                             their
>>>                             experiments lead.
>>>                             -- Norbert Wiener
>>>
>>>
>>>
>>>                         --
>>>                         What most experimenters take for granted
>>>                         before they begin their
>>>                         experiments
>>>                         is infinitely more interesting than any
>>>                         results to which their
>>>                         experiments
>>>                         lead.
>>>                         -- Norbert Wiener
>>>
>>>
>>>                 --
>>>                 -----------------------------------------------
>>>                 Sanjay Govindjee, PhD, PE
>>>                 Professor of Civil Engineering
>>>                 Vice Chair for Academic Affairs
>>>
>>>                 779 Davis Hall
>>>                 Structural Engineering, Mechanics and Materials
>>>                 Department of Civil Engineering
>>>                 University of California
>>>                 Berkeley, CA 94720-1710
>>>
>>>                 Voice: +1 510 642 6060 <tel:%2B1%20510%20642%206060>
>>>                 FAX: +1 510 643 5264 <tel:%2B1%20510%20643%205264>
>>>                 s_g at berkeley.edu <mailto:s_g at berkeley.edu>
>>>                 http://www.ce.berkeley.edu/~sanjay
>>>                 <http://www.ce.berkeley.edu/%7Esanjay>
>>>                 -----------------------------------------------
>>>
>>>                 New Books:
>>>
>>>                 Engineering Mechanics of Deformable
>>>                 Solids: A Presentation with Exercises
>>>                 http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641
>>>                 http://ukcatalogue.oup.com/product/9780199651641.do
>>>                 http://amzn.com/0199651647
>>>
>>>
>>>                 Engineering Mechanics 3 (Dynamics)
>>>                 http://www.springer.com/materials/mechanics/book/978-3-642-14018-1
>>>                 http://amzn.com/3642140181
>>>
>>>                 -----------------------------------------------
>>>
>>>
>>>         -- 
>>>         -----------------------------------------------
>>>         Sanjay Govindjee, PhD, PE
>>>         Professor of Civil Engineering
>>>         Vice Chair for Academic Affairs
>>>
>>>         779 Davis Hall
>>>         Structural Engineering, Mechanics and Materials
>>>         Department of Civil Engineering
>>>         University of California
>>>         Berkeley, CA 94720-1710
>>>
>>>         Voice: +1 510 642 6060 <tel:%2B1%20510%20642%206060>
>>>         FAX: +1 510 643 5264 <tel:%2B1%20510%20643%205264>
>>>         s_g at berkeley.edu <mailto:s_g at berkeley.edu>
>>>         http://www.ce.berkeley.edu/~sanjay
>>>         <http://www.ce.berkeley.edu/%7Esanjay>
>>>         -----------------------------------------------
>>>
>>>         New Books:
>>>
>>>         Engineering Mechanics of Deformable
>>>         Solids: A Presentation with Exercises
>>>         http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641
>>>         http://ukcatalogue.oup.com/product/9780199651641.do
>>>         http://amzn.com/0199651647
>>>
>>>
>>>         Engineering Mechanics 3 (Dynamics)
>>>         http://www.springer.com/materials/mechanics/book/978-3-642-14018-1
>>>         http://amzn.com/3642140181
>>>
>>>         -----------------------------------------------
>>>
>>>
>>>
>>>
>>>     -- 
>>>     What most experimenters take for granted before they begin their
>>>     experiments is infinitely more interesting than any results to
>>>     which their experiments lead.
>>>     -- Norbert Wiener
>>
>>     -- 
>>     -----------------------------------------------
>>     Sanjay Govindjee, PhD, PE
>>     Professor of Civil Engineering
>>     Vice Chair for Academic Affairs
>>
>>     779 Davis Hall
>>     Structural Engineering, Mechanics and Materials
>>     Department of Civil Engineering
>>     University of California
>>     Berkeley, CA 94720-1710
>>
>>     Voice:+1 510 642 6060  <tel:%2B1%20510%20642%206060>
>>     FAX:+1 510 643 5264  <tel:%2B1%20510%20643%205264>
>>     s_g at berkeley.edu  <mailto:s_g at berkeley.edu>
>>     http://www.ce.berkeley.edu/~sanjay  <http://www.ce.berkeley.edu/%7Esanjay>
>>     -----------------------------------------------
>>
>>     New Books:
>>
>>     Engineering Mechanics of Deformable
>>     Solids: A Presentation with Exercises
>>     http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641
>>     http://ukcatalogue.oup.com/product/9780199651641.do
>>     http://amzn.com/0199651647
>>
>>
>>     Engineering Mechanics 3 (Dynamics)
>>     http://www.springer.com/materials/mechanics/book/978-3-642-14018-1
>>     http://amzn.com/3642140181
>>
>>     -----------------------------------------------
>
>     -- 
>     -----------------------------------------------
>     Sanjay Govindjee, PhD, PE
>     Professor of Civil Engineering
>     Vice Chair for Academic Affairs
>
>     779 Davis Hall
>     Structural Engineering, Mechanics and Materials
>     Department of Civil Engineering
>     University of California
>     Berkeley, CA 94720-1710
>
>     Voice:+1 510 642 6060  <tel:%2B1%20510%20642%206060>
>     FAX:+1 510 643 5264  <tel:%2B1%20510%20643%205264>
>     s_g at berkeley.edu  <mailto:s_g at berkeley.edu>
>     http://www.ce.berkeley.edu/~sanjay  <http://www.ce.berkeley.edu/%7Esanjay>
>     -----------------------------------------------
>
>     New Books:
>
>     Engineering Mechanics of Deformable
>     Solids: A Presentation with Exercises
>     http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641
>     http://ukcatalogue.oup.com/product/9780199651641.do
>     http://amzn.com/0199651647
>
>
>     Engineering Mechanics 3 (Dynamics)
>     http://www.springer.com/materials/mechanics/book/978-3-642-14018-1
>     http://amzn.com/3642140181
>
>     -----------------------------------------------
>
>
>
>
> -- 
> What most experimenters take for granted before they begin their 
> experiments is infinitely more interesting than any results to which 
> their experiments lead.
> -- Norbert Wiener

-- 
-----------------------------------------------
Sanjay Govindjee, PhD, PE
Professor of Civil Engineering
Vice Chair for Academic Affairs

779 Davis Hall
Structural Engineering, Mechanics and Materials
Department of Civil Engineering
University of California
Berkeley, CA 94720-1710

Voice:  +1 510 642 6060
FAX:    +1 510 643 5264
s_g at berkeley.edu
http://www.ce.berkeley.edu/~sanjay
-----------------------------------------------

New Books:

Engineering Mechanics of Deformable
Solids: A Presentation with Exercises
http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641
http://ukcatalogue.oup.com/product/9780199651641.do
http://amzn.com/0199651647


Engineering Mechanics 3 (Dynamics)
http://www.springer.com/materials/mechanics/book/978-3-642-14018-1
http://amzn.com/3642140181

-----------------------------------------------

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121226/5bd6bb78/attachment-0001.html>

From s_g at berkeley.edu  Wed Dec 26 23:00:15 2012
From: s_g at berkeley.edu (Sanjay Govindjee)
Date: Wed, 26 Dec 2012 21:00:15 -0800
Subject: [petsc-users] Using superlu_dist in a direct solve
In-Reply-To: <CAMYG4GmMiZqkeCS-X-KB2nkZD=8UWhBZ90TkbWhHbYpTA7jXrg@mail.gmail.com>
References: <50D7A664.6080802@berkeley.edu>
	<CAM9tzSnY9o6EckTmo_GkD6QxHjj3o54ragfJotbacPkr8fKfmw@mail.gmail.com>
	<50D7AB15.5040606@berkeley.edu>
	<CAMYG4GmuhF7Jh7os3j9bLubuoMGnerxtN+tU=gzpeYmX+3UUXA@mail.gmail.com>
	<50D7BFF3.3030909@berkeley.edu>
	<CAMYG4GmW7FfEqvV6YEPygXFjpxjD2sDA9fkroAJ+DXXbjiW9nw@mail.gmail.com>
	<50DB6892.5040402@berkeley.edu>
	<CAGCphBubYXDONH97Pf+FGvXzHEyyJqZaZpx0JbKoBL5_ypZiWw@mail.gmail.com>
	<50DB6C05.4090006@berkeley.edu>
	<CAGCphBtf2Q3d2VS_DYdxvTY5iaFqJFZAKkkHw2OmhrYyP7=hdw@mail.gmail.com>
	<50DB6E50.3050001@berkeley.edu>
	<CAMYG4Gmn8uwypwzoGRmjaOhDRL7Mbkdqn6ii5VORh_wxok2PrA@mail.gmail.com>
	<50DBA349.7030307@berkeley.edu> <50DBA5C0.4050807@berkeley.edu>
	<CAMYG4GmMiZqkeCS-X-KB2nkZD=8UWhBZ90TkbWhHbYpTA7jXrg@mail.gmail.com>
Message-ID: <50DBD5DF.50301@berkeley.edu>

fyi,  here is what gets printed when I make ex2:

sg-macbook-prolocal:tutorials sg$ make ex2
/Users/sg/petsc-3.3-p5/gnu_shared/bin/mpicc -o ex2.o -c -fPIC -Wall 
-Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -g3 
-fno-inline -O0 -I/Users/sg/petsc-3.3-p5/include 
-I/Users/sg/petsc-3.3-p5/gnu_shared/include 
-D__INSDIR__=src/ksp/ksp/examples/tutorials/ ex2.c

/Users/sg/petsc-3.3-p5/gnu_shared/bin/mpicc 
-Wl,-multiply_defined,suppress -Wl,-multiply_defined -Wl,suppress 
-Wl,-commons,use_dylibs -Wl,-search_paths_first 
-Wl,-multiply_defined,suppress -Wl,-multiply_defined -Wl,suppress 
-Wl,-commons,use_dylibs -Wl,-search_paths_first  -fPIC -Wall 
-Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -g3 
-fno-inline -O0  -o ex2 ex2.o -L/Users/sg/petsc-3.3-p5//gnu_shared/lib 
-L/Users/sg/petsc-3.3-p5/gnu_shared/lib  -lpetsc -L/usr/X11R6/lib -lX11 
-lpromfei -lprometheus 
-L/usr/local/lib/gcc/x86_64-apple-darwin11.4.0/4.8.0 -L/usr/local/lib 
-lmpichcxx -lstdc++ -lsuperlu_dist_3.1 -lparmetis -lmetis -lHYPRE 
-lmpichcxx -lstdc++ -lml -lmpichcxx -lstdc++ -lpthread -lspooles 
-llapack -lblas -ldl -lmpichf90 -lpthread -lgfortran -lgfortran 
-lquadmath -lm -lm -lmpichcxx -lstdc++ -lpmpich -lmpich -lopa -lmpl 
-lSystem -lgcc_ext.10.5 -ldl
/bin/rm -f ex2.o

On 12/26/12 6:46 PM, Matthew Knepley wrote:
> On Wed, Dec 26, 2012 at 8:34 PM, Sanjay Govindjee <s_g at berkeley.edu 
> <mailto:s_g at berkeley.edu>> wrote:
>
>     For what it is worth.  I ran the problems with valgrind (before I
>     built the --with-shared-libraries version).
>     With spooles the run is essentially clean.  With superlu I see
>     lots of errors of the type:
>
>
> This looks like a well-known MPICH problem with valgrind reporting. 
> However, these
> stacks look strange. You should have source line numbers if this is 
> compiled with debugging
> and you should have the whole stack for MPICH.
>
> Also, why is libquadmath being linked?
>
>    Matt
>
>     ==91099== Syscall param writev(vector[...]) points to
>     uninitialised byte(s)
>     ==91099==    at 0x1245FF2: writev (in
>     /usr/lib/system/libsystem_kernel.dylib)
>     ==91099==    by 0x101209846: MPIDU_Sock_writev (in ./ex2)
>     ==91099==    by 0x101A2BA23: ???
>     ==91099==    by 0x1FFFFFFFB: ???
>     ==91099==    by 0x101A2BA0F: ???
>     ==91099==    by 0x10852053F: ???
>     ==91099==    by 0x101A24907: ???
>     ==91099==    by 0x7FFF5FBFE2DF: ???
>     ==91099==    by 0x1: ???
>     ==91099==    by 0x10120AF13: MPIDI_CH3_iSendv (in ./ex2)
>     ==91099==  Address 0x10712d0c8 is 136 bytes inside a block of size
>     1,661,792 alloc'd
>     ==91099==    at 0xC713: malloc (vg_replace_malloc.c:271)
>     ==91099==    by 0x100D5C6DF: superlu_malloc_dist (in ./ex2)
>     ==91099==    by 0x100D23375: doubleMalloc_dist (in ./ex2)
>     ==91099==    by 0x100D415C1: pdgstrs (in ./ex2)
>     ==91099==    by 0x100D3F852: pdgssvx (in ./ex2)
>     ==91099==    by 0x1007E5D38: MatSolve_SuperLU_DIST (in ./ex2)
>     ==91099==    by 0x1002BDA1E: MatSolve (in ./ex2)
>     ==91099==    by 0x1009EAF55: PCApply_LU (in ./ex2)
>     ==91099==    by 0x100AAE053: PCApply (in ./ex2)
>     ==91099==    by 0x100B1BCEE: KSPSolve_PREONLY (in ./ex2)
>     ==91099==    by 0x100B54F55: KSPSolve (in ./ex2)
>     ==91099==    by 0x1000022FC: main (in ./ex2)
>
>
>
>     On 12/26/12 5:24 PM, Sanjay Govindjee wrote:
>>     I have re-configured/built using:
>>
>>     ./configure PETSC_ARCH=gnu_shared
>>     -download-{spooles,parmetis,superlu_dist,prometheus,mpich,ml,hypre,metis}
>>     --with-shared-libraries
>>
>>     make  PETSC_ARCH=gnu_shared all
>>
>>     make  PETSC_ARCH=gnu_shared test
>>
>>
>>     Using the same test problem
>>     (src/ksp/ksp/examples/tutorials/ex2.c), on the 100x100 case I get:
>>
>>     sg-macbook-prolocal:tutorials sg$ make PETSC_ARCH=gnu_shared
>>     runex2spooles
>>     Norm of error 2.21422e-12 iterations 1
>>     sg-macbook-prolocal:tutorials sg$ make PETSC_ARCH=gnu_shared
>>     runex2superlu
>>     Norm of error 7.66145e-13 iterations 1
>>
>>     One the 500x500 case I get:
>>
>>     sg-macbook-prolocal:tutorials sg$ make PETSC_ARCH=gnu_shared
>>     runex2spooles
>>     Norm of error 2.69468e-10 iterations 1
>>     sg-macbook-prolocal:tutorials sg$ make PETSC_ARCH=gnu_shared
>>     runex2superlu
>>     Norm of error 419.953 iterations 1
>>
>>     otool shows:
>>
>>     sg-macbook-prolocal:tutorials sg$ otool -L ex2
>>     ex2:
>>     /Users/sg/petsc-3.3-p5/gnu_shared/lib/libpetsc.dylib
>>     (compatibility version 0.0.0, current version 0.0.0)
>>         /usr/X11/lib/libX11.6.dylib (compatibility version 10.0.0,
>>     current version 10.0.0)
>>     /Users/sg/petsc-3.3-p5/gnu_shared/lib/libmpichcxx.dylib
>>     (compatibility version 0.0.0, current version 3.0.0)
>>         /usr/local/lib/libstdc++.6.dylib (compatibility version
>>     7.0.0, current version 7.17.0)
>>     /Users/sg/petsc-3.3-p5/gnu_shared/lib/libparmetis.dylib
>>     (compatibility version 0.0.0, current version 0.0.0)
>>     /Users/sg/petsc-3.3-p5/gnu_shared/lib/libmetis.dylib
>>     (compatibility version 0.0.0, current version 0.0.0)
>>     /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libLAPACK.dylib
>>     (compatibility version 1.0.0, current version 1.0.0)
>>     /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
>>     (compatibility version 1.0.0, current version 1.0.0)
>>     /Users/sg/petsc-3.3-p5/gnu_shared/lib/libmpichf90.dylib
>>     (compatibility version 0.0.0, current version 3.0.0)
>>         /usr/local/lib/libgfortran.3.dylib (compatibility version
>>     4.0.0, current version 4.0.0)
>>         /usr/local/lib/libquadmath.0.dylib (compatibility version
>>     1.0.0, current version 1.0.0)
>>     /Users/sg/petsc-3.3-p5/gnu_shared/lib/libpmpich.dylib
>>     (compatibility version 0.0.0, current version 3.0.0)
>>     /Users/sg/petsc-3.3-p5/gnu_shared/lib/libmpich.dylib
>>     (compatibility version 0.0.0, current version 3.0.0)
>>     /Users/sg/petsc-3.3-p5/gnu_shared/lib/libopa.1.dylib
>>     (compatibility version 2.0.0, current version 2.0.0)
>>     /Users/sg/petsc-3.3-p5/gnu_shared/lib/libmpl.1.dylib
>>     (compatibility version 3.0.0, current version 3.0.0)
>>         /usr/lib/libSystem.B.dylib (compatibility version 1.0.0,
>>     current version 159.1.0)
>>         /usr/local/lib/libgcc_s.1.dylib (compatibility version 1.0.0,
>>     current version 1.0.0)
>>
>>
>>
>>
>>     On 12/26/12 3:08 PM, Matthew Knepley wrote:
>>>
>>>     On Wed, Dec 26, 2012 at 4:38 PM, Sanjay Govindjee
>>>     <s_g at berkeley.edu <mailto:s_g at berkeley.edu>> wrote:
>>>
>>>         I have a macbook pro (Mac OS X 10.7.5)
>>>
>>>         % uname -a
>>>         Darwin sg-macbook-prolocal.local 11.4.2 Darwin Kernel
>>>         Version 11.4.2: Thu Aug 23 16:25:48 PDT 2012;
>>>         root:xnu-1699.32.7~1/RELEASE_X86_64 x86_64
>>>
>>>         I configured using:
>>>
>>>
>>>         ./configure --with-cc=icc --with-fc=ifort
>>>         -download-{spooles,parmetis,superlu_dist,prometheus,mpich,ml,hypre,metis}
>>>
>>>         so everything was built together.
>>>
>>>
>>>     Since
>>>
>>>       a) you have tried other compilers
>>>
>>>       b) we cannot reproduce it
>>>
>>>       c) we are building the library during configure
>>>
>>>     I would guess that some outside library, in your default link
>>>     path, is contaminating
>>>     the executable with symbols which override some of those in
>>>     SuperLU. The SuperLU
>>>     people are not super careful about naming. Could you
>>>
>>>       1) Try this same exercise using --with-shared-libraries
>>>
>>>       2) Once you do that, use otool -L on the executable so we can
>>>     see where everything comes from
>>>
>>>       Thanks,
>>>
>>>           Matt
>>>
>>>         On 12/26/12 1:34 PM, Hong Zhang wrote:
>>>
>>>             Sanjay:
>>>
>>>                 hmmm....I guess that is good news -- in that superlu
>>>                 is not broken. However,
>>>                 for me
>>>                 not so good news since I seems that there is nasty
>>>                 bug lurking on my
>>>                 machine.
>>>
>>>                 Any suggestions on chasing down the error?
>>>
>>>             How did you install your supelu_dist with petsc-3.3?
>>>             What machine do you use?
>>>
>>>             Hong
>>>
>>>
>>>                 On 12/26/12 1:23 PM, Hong Zhang wrote:
>>>
>>>                     Sanjay:
>>>                     I get
>>>                     petsc-3.3/src/ksp/ksp/examples/tutorials>mpiexec
>>>                     -n 2 ./ex2
>>>                     -ksp_monitor_short  -ksp_type preonly -pc_type lu
>>>                     -pc_factor_mat_solver_package superlu_dist -m
>>>                     500 -n 500
>>>                     Norm of error 1.92279e-11 iterations 1
>>>
>>>                     Hong
>>>
>>>                         I have done some more testing of the
>>>                         problem, continuing with
>>>                         src/ksp/ksp/examples/tutorials/ex2.c.
>>>
>>>                         The behavior I am seeing is that with
>>>                         smaller problems sizes superlu_dist
>>>                         is
>>>                         behaving properly
>>>                         but with larger problem sizes things seem to
>>>                         go wrong and what goes wrong
>>>                         is
>>>                         apparently consistent; the error appears
>>>                         both with my intel build as well
>>>                         as
>>>                         with my gcc build.
>>>
>>>                         I have two run lines:
>>>
>>>                         runex2superlu:
>>>                                   -@${MPIEXEC} -n 2 ./ex2
>>>                         -ksp_monitor_short -m 100 -n 100
>>>                         -ksp_type
>>>                         preonly -pc_type lu
>>>                         -pc_factor_mat_solver_package superlu_dist
>>>
>>>                         runex2spooles:
>>>                                   -@${MPIEXEC} -n 2 ./ex2
>>>                         -ksp_monitor_short -m 100 -n 100
>>>                         -ksp_type
>>>                         preonly -pc_type lu
>>>                         -pc_factor_mat_solver_package spooles
>>>
>>>                           From my intel build, I get
>>>
>>>                         sg-macbook-prolocal:tutorials sg$ make
>>>                         runex2superlu
>>>                         Norm of error 7.66145e-13 iterations 1
>>>                         sg-macbook-prolocal:tutorials sg$ make
>>>                         runex2spooles
>>>                         Norm of error 2.21422e-12 iterations 1
>>>
>>>                           From my GCC build, I get
>>>                         sg-macbook-prolocal:tutorials sg$ make
>>>                         runex2superlu
>>>                         Norm of error 7.66145e-13 iterations 1
>>>                         sg-macbook-prolocal:tutorials sg$ make
>>>                         runex2spooles
>>>                         Norm of error 2.21422e-12 iterations 1
>>>
>>>                         If I change the -m 100 -n 100 to -m 500 -n
>>>                         500, I get for my intel build
>>>
>>>                         sg-macbook-prolocal:tutorials sg$ make
>>>                         runex2superlu
>>>                         Norm of error 419.953 iterations 1
>>>                         sg-macbook-prolocal:tutorials sg$ make
>>>                         runex2spooles
>>>                         Norm of error 2.69468e-10 iterations 1
>>>
>>>                           From my GCC build with -m 500 -n 500, I get
>>>
>>>                         sg-macbook-prolocal:tutorials sg$ make
>>>                         runex2superlu
>>>                         Norm of error 419.953 iterations 1
>>>                         sg-macbook-prolocal:tutorials sg$ make
>>>                         runex2spooles
>>>                         Norm of error 2.69468e-10 iterations 1
>>>
>>>
>>>                         Any suggestions will be greatly appreciated.
>>>
>>>                         -sanjay
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>                         On 12/23/12 6:42 PM, Matthew Knepley wrote:
>>>
>>>
>>>                         On Sun, Dec 23, 2012 at 9:37 PM, Sanjay
>>>                         Govindjee <s_g at berkeley.edu
>>>                         <mailto:s_g at berkeley.edu>>
>>>                         wrote:
>>>
>>>                             I decided to go with
>>>                             ksp/ksp/exampeles/tutorials/ex2.c; I was
>>>                             unsure how
>>>                             to convert the run lines for
>>>                             snes/examples/ex5.c to work with a direct
>>>                             solver as I am not versed in SNES options.
>>>
>>>                             Notwithstanding something strange is
>>>                             happening only on select examples.
>>>                             With ksp/ksp/exampeles/tutorials/ex2.c
>>>                             and the run line:
>>>
>>>                             -@${MPIEXEC} -n 2 ./ex2
>>>                             -ksp_monitor_short -m 20 -n 20 -ksp_type
>>>                             preonly
>>>                             -pc_type lu
>>>                             -pc_factor_mat_solver_package superlu_dist
>>>
>>>                             I get good results (of the order):
>>>
>>>                             Norm of error 1.85464e-14 iterations 1
>>>
>>>                             using both superlu_dist and spooles.
>>>
>>>                             My BLAS/LAPACK: -llapack -lblas (so
>>>                             native to my machine).
>>>
>>>                             If you can guide me on a run line for
>>>                             the snes ex5.c I can try that too.
>>>                             I'll also try to construct a GCC build
>>>                             later to see if that is an issue.
>>>
>>>
>>>                         Same line on ex5, but ex2 is good enough.
>>>                         However, it will not tell us
>>>                         anything new. Try another build.
>>>
>>>                              Matt
>>>
>>>                             -sanjay
>>>
>>>
>>>                             On 12/23/12 5:58 PM, Matthew Knepley wrote:
>>>
>>>                             On Sun, Dec 23, 2012 at 8:08 PM, Sanjay
>>>                             Govindjee <s_g at berkeley.edu
>>>                             <mailto:s_g at berkeley.edu>>
>>>                             wrote:
>>>
>>>                                 Not sure what you mean by where is
>>>                                 your matrix?  I am simply running
>>>                                 ex6
>>>                                 in the ksp/examples/tests directory.
>>>
>>>                                 The reason I ran this test is
>>>                                 because I was seeing the same behavior
>>>                                 with
>>>                                 my finite element code (on perfectly
>>>                                 benign problems).
>>>
>>>                                 Is there a built-in test that you
>>>                                 use to check that superlu_dist is
>>>                                 working properly with petsc?
>>>                                 i.e. something you know that works
>>>                                 with with petsc 3.3-p5?
>>>
>>>
>>>                             1) Run it on a SNES ex5 (or KSP ex2),
>>>                             which is a nice Laplacian
>>>
>>>                             2) Compare with MUMPS
>>>
>>>                                  Matt
>>>
>>>                                 -sanjay
>>>
>>>
>>>
>>>                                 On 12/23/12 4:56 PM, Jed Brown wrote:
>>>
>>>                                 Where is your matrix? It might be
>>>                                 ending up with a very bad pivot. If
>>>                                 the
>>>                                 problem can be reproduced, it should
>>>                                 be reported to the SuperLU_DIST
>>>                                 developers to fix. (Note that we do
>>>                                 not see this with other matrices.)
>>>                                 You
>>>                                 can also try MUMPS.
>>>
>>>
>>>                                 On Sun, Dec 23, 2012 at 6:48 PM,
>>>                                 Sanjay Govindjee <s_g at berkeley.edu
>>>                                 <mailto:s_g at berkeley.edu>>
>>>                                 wrote:
>>>
>>>                                     I wanted to use SuperLU Dist to
>>>                                     perform a direct solve but seem
>>>                                     to be
>>>                                     encountering
>>>                                     a problem.  I was wonder if this
>>>                                     is a know issue and if there is a
>>>                                     solution for it.
>>>
>>>                                     The problem is easily observed
>>>                                     using ex6.c in
>>>                                     src/ksp/ksp/examples/tests.
>>>
>>>                                     Out of the box: make runex6
>>>                                     produces a residual error of
>>>                                     O(1e-11), all
>>>                                     is well.
>>>
>>>                                     I then changed the run to run on
>>>                                     two processors and add the flag
>>>                                     -pc_factor_mat_solver_package
>>>                                     spooles  this produces a
>>>                                     residual error
>>>                                     of
>>>                                     O(1e-11), all is still well.
>>>
>>>                                     I then switch over to
>>>                                     -pc_factor_mat_solver_package
>>>                                     superlu_dist and
>>>                                     the
>>>                                     residual error comes back as
>>>                                     22.6637!  Something seems very
>>>                                     wrong.
>>>
>>>                                     My build is perfectly vanilla:
>>>
>>>                                     export
>>>                                     PETSC_DIR=/Users/sg/petsc-3.3-p5/
>>>                                     export PETSC_ARCH=intel
>>>
>>>                                     ./configure --with-cc=icc
>>>                                     --with-fc=ifort  \
>>>
>>>                                     -download-{spooles,parmetis,superlu_dist,prometheus,mpich,ml,hypre,metis}
>>>
>>>                                     make
>>>                                     PETSC_DIR=/Users/sg/petsc-3.3-p5/ PETSC_ARCH=intel
>>>                                     all
>>>                                     make
>>>                                     PETSC_DIR=/Users/sg/petsc-3.3-p5/ PETSC_ARCH=intel
>>>                                     test
>>>
>>>                                     -sanjay
>>>
>>>
>>>
>>>                                 --
>>>                                 -----------------------------------------------
>>>                                 Sanjay Govindjee, PhD, PE
>>>                                 Professor of Civil Engineering
>>>                                 Vice Chair for Academic Affairs
>>>
>>>                                 779 Davis Hall
>>>                                 Structural Engineering, Mechanics
>>>                                 and Materials
>>>                                 Department of Civil Engineering
>>>                                 University of California
>>>                                 Berkeley, CA 94720-1710
>>>
>>>                                 Voice: +1 510 642 6060
>>>                                 <tel:%2B1%20510%20642%206060>
>>>                                 FAX: +1 510 643 5264
>>>                                 <tel:%2B1%20510%20643%205264>
>>>                                 s_g at berkeley.edu
>>>                                 <mailto:s_g at berkeley.edu>
>>>                                 http://www.ce.berkeley.edu/~sanjay
>>>                                 <http://www.ce.berkeley.edu/%7Esanjay>
>>>                                 -----------------------------------------------
>>>
>>>                                 New Books:
>>>
>>>                                 Engineering Mechanics of Deformable
>>>                                 Solids: A Presentation with Exercises
>>>
>>>
>>>                                 http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641
>>>                                 http://ukcatalogue.oup.com/product/9780199651641.do
>>>                                 http://amzn.com/0199651647
>>>
>>>
>>>                                 Engineering Mechanics 3 (Dynamics)
>>>                                 http://www.springer.com/materials/mechanics/book/978-3-642-14018-1
>>>                                 http://amzn.com/3642140181
>>>
>>>                                 -----------------------------------------------
>>>
>>>
>>>
>>>
>>>                             --
>>>                             What most experimenters take for granted
>>>                             before they begin their
>>>                             experiments is infinitely more
>>>                             interesting than any results to which
>>>                             their
>>>                             experiments lead.
>>>                             -- Norbert Wiener
>>>
>>>
>>>
>>>                         --
>>>                         What most experimenters take for granted
>>>                         before they begin their
>>>                         experiments
>>>                         is infinitely more interesting than any
>>>                         results to which their
>>>                         experiments
>>>                         lead.
>>>                         -- Norbert Wiener
>>>
>>>
>>>                 --
>>>                 -----------------------------------------------
>>>                 Sanjay Govindjee, PhD, PE
>>>                 Professor of Civil Engineering
>>>                 Vice Chair for Academic Affairs
>>>
>>>                 779 Davis Hall
>>>                 Structural Engineering, Mechanics and Materials
>>>                 Department of Civil Engineering
>>>                 University of California
>>>                 Berkeley, CA 94720-1710
>>>
>>>                 Voice: +1 510 642 6060 <tel:%2B1%20510%20642%206060>
>>>                 FAX: +1 510 643 5264 <tel:%2B1%20510%20643%205264>
>>>                 s_g at berkeley.edu <mailto:s_g at berkeley.edu>
>>>                 http://www.ce.berkeley.edu/~sanjay
>>>                 <http://www.ce.berkeley.edu/%7Esanjay>
>>>                 -----------------------------------------------
>>>
>>>                 New Books:
>>>
>>>                 Engineering Mechanics of Deformable
>>>                 Solids: A Presentation with Exercises
>>>                 http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641
>>>                 http://ukcatalogue.oup.com/product/9780199651641.do
>>>                 http://amzn.com/0199651647
>>>
>>>
>>>                 Engineering Mechanics 3 (Dynamics)
>>>                 http://www.springer.com/materials/mechanics/book/978-3-642-14018-1
>>>                 http://amzn.com/3642140181
>>>
>>>                 -----------------------------------------------
>>>
>>>
>>>         -- 
>>>         -----------------------------------------------
>>>         Sanjay Govindjee, PhD, PE
>>>         Professor of Civil Engineering
>>>         Vice Chair for Academic Affairs
>>>
>>>         779 Davis Hall
>>>         Structural Engineering, Mechanics and Materials
>>>         Department of Civil Engineering
>>>         University of California
>>>         Berkeley, CA 94720-1710
>>>
>>>         Voice: +1 510 642 6060 <tel:%2B1%20510%20642%206060>
>>>         FAX: +1 510 643 5264 <tel:%2B1%20510%20643%205264>
>>>         s_g at berkeley.edu <mailto:s_g at berkeley.edu>
>>>         http://www.ce.berkeley.edu/~sanjay
>>>         <http://www.ce.berkeley.edu/%7Esanjay>
>>>         -----------------------------------------------
>>>
>>>         New Books:
>>>
>>>         Engineering Mechanics of Deformable
>>>         Solids: A Presentation with Exercises
>>>         http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641
>>>         http://ukcatalogue.oup.com/product/9780199651641.do
>>>         http://amzn.com/0199651647
>>>
>>>
>>>         Engineering Mechanics 3 (Dynamics)
>>>         http://www.springer.com/materials/mechanics/book/978-3-642-14018-1
>>>         http://amzn.com/3642140181
>>>
>>>         -----------------------------------------------
>>>
>>>
>>>
>>>
>>>     -- 
>>>     What most experimenters take for granted before they begin their
>>>     experiments is infinitely more interesting than any results to
>>>     which their experiments lead.
>>>     -- Norbert Wiener
>>
>>     -- 
>>     -----------------------------------------------
>>     Sanjay Govindjee, PhD, PE
>>     Professor of Civil Engineering
>>     Vice Chair for Academic Affairs
>>
>>     779 Davis Hall
>>     Structural Engineering, Mechanics and Materials
>>     Department of Civil Engineering
>>     University of California
>>     Berkeley, CA 94720-1710
>>
>>     Voice:+1 510 642 6060  <tel:%2B1%20510%20642%206060>
>>     FAX:+1 510 643 5264  <tel:%2B1%20510%20643%205264>
>>     s_g at berkeley.edu  <mailto:s_g at berkeley.edu>
>>     http://www.ce.berkeley.edu/~sanjay  <http://www.ce.berkeley.edu/%7Esanjay>
>>     -----------------------------------------------
>>
>>     New Books:
>>
>>     Engineering Mechanics of Deformable
>>     Solids: A Presentation with Exercises
>>     http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641
>>     http://ukcatalogue.oup.com/product/9780199651641.do
>>     http://amzn.com/0199651647
>>
>>
>>     Engineering Mechanics 3 (Dynamics)
>>     http://www.springer.com/materials/mechanics/book/978-3-642-14018-1
>>     http://amzn.com/3642140181
>>
>>     -----------------------------------------------
>
>     -- 
>     -----------------------------------------------
>     Sanjay Govindjee, PhD, PE
>     Professor of Civil Engineering
>     Vice Chair for Academic Affairs
>
>     779 Davis Hall
>     Structural Engineering, Mechanics and Materials
>     Department of Civil Engineering
>     University of California
>     Berkeley, CA 94720-1710
>
>     Voice:+1 510 642 6060  <tel:%2B1%20510%20642%206060>
>     FAX:+1 510 643 5264  <tel:%2B1%20510%20643%205264>
>     s_g at berkeley.edu  <mailto:s_g at berkeley.edu>
>     http://www.ce.berkeley.edu/~sanjay  <http://www.ce.berkeley.edu/%7Esanjay>
>     -----------------------------------------------
>
>     New Books:
>
>     Engineering Mechanics of Deformable
>     Solids: A Presentation with Exercises
>     http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641
>     http://ukcatalogue.oup.com/product/9780199651641.do
>     http://amzn.com/0199651647
>
>
>     Engineering Mechanics 3 (Dynamics)
>     http://www.springer.com/materials/mechanics/book/978-3-642-14018-1
>     http://amzn.com/3642140181
>
>     -----------------------------------------------
>
>
>
>
> -- 
> What most experimenters take for granted before they begin their 
> experiments is infinitely more interesting than any results to which 
> their experiments lead.
> -- Norbert Wiener

-- 
-----------------------------------------------
Sanjay Govindjee, PhD, PE
Professor of Civil Engineering
Vice Chair for Academic Affairs

779 Davis Hall
Structural Engineering, Mechanics and Materials
Department of Civil Engineering
University of California
Berkeley, CA 94720-1710

Voice:  +1 510 642 6060
FAX:    +1 510 643 5264
s_g at berkeley.edu
http://www.ce.berkeley.edu/~sanjay
-----------------------------------------------

New Books:

Engineering Mechanics of Deformable
Solids: A Presentation with Exercises
http://www.oup.com/us/catalog/general/subject/Physics/MaterialsScience/?view=usa&ci=9780199651641
http://ukcatalogue.oup.com/product/9780199651641.do
http://amzn.com/0199651647


Engineering Mechanics 3 (Dynamics)
http://www.springer.com/materials/mechanics/book/978-3-642-14018-1
http://amzn.com/3642140181

-----------------------------------------------

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121226/008472ec/attachment-0001.html>

From abarua at iit.edu  Thu Dec 27 00:22:28 2012
From: abarua at iit.edu (amlan barua)
Date: Thu, 27 Dec 2012 00:22:28 -0600
Subject: [petsc-users] (no subject)
Message-ID: <CA+Z=SOq4W_CdrM1ZUyOqh1oqWbCVPnohadDQZS=E-no1ZQve1Q@mail.gmail.com>

Hi,
Is there an analogue of VecScatterCreateToZero for DA vectors? The DMDA
object has more than one degrees of freedom.
If there isn't any, should I use an IS object to do the scattering?
Amlan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121227/f6524909/attachment.html>

From knepley at gmail.com  Thu Dec 27 06:36:04 2012
From: knepley at gmail.com (Matthew Knepley)
Date: Thu, 27 Dec 2012 07:36:04 -0500
Subject: [petsc-users] (no subject)
In-Reply-To: <CA+Z=SOq4W_CdrM1ZUyOqh1oqWbCVPnohadDQZS=E-no1ZQve1Q@mail.gmail.com>
References: <CA+Z=SOq4W_CdrM1ZUyOqh1oqWbCVPnohadDQZS=E-no1ZQve1Q@mail.gmail.com>
Message-ID: <CAMYG4Gm6SQ2vKcVHJO7BXJbSiDb5tHFymWSW0Ufc6FkTTd_w0g@mail.gmail.com>

On Thu, Dec 27, 2012 at 1:22 AM, amlan barua <abarua at iit.edu> wrote:

> Hi,
> Is there an analogue of VecScatterCreateToZero for DA vectors? The DMDA
> object has more than one degrees of freedom.
> If there isn't any, should I use an IS object to do the scattering?
>

I do not understand. Why can't you give your DA vector as input?

   Matt


>
> Amlan
>



-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121227/1558c70e/attachment.html>

From bsmith at mcs.anl.gov  Thu Dec 27 08:18:19 2012
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Thu, 27 Dec 2012 08:18:19 -0600
Subject: [petsc-users] (no subject)
In-Reply-To: <CA+Z=SOq4W_CdrM1ZUyOqh1oqWbCVPnohadDQZS=E-no1ZQve1Q@mail.gmail.com>
References: <CA+Z=SOq4W_CdrM1ZUyOqh1oqWbCVPnohadDQZS=E-no1ZQve1Q@mail.gmail.com>
Message-ID: <B7D9ADAB-EFD6-4C1E-B41A-1622659D4A59@mcs.anl.gov>


  ierr = DMDACreateNaturalVector(da,&natural);CHKERRQ(ierr);
    ierr = DMDAGlobalToNaturalBegin(da,xin,INSERT_VALUES,natural);CHKERRQ(ierr);
    ierr = DMDAGlobalToNaturalEnd(da,xin,INSERT_VALUES,natural);CHKERRQ(ierr);

Now do VecScatterCreateToZero() from natural and the vector will be in the natural ordering on process zero with the dof interlaced.


   Barry

On Dec 27, 2012, at 12:22 AM, amlan barua <abarua at iit.edu> wrote:

> Hi,
> Is there an analogue of VecScatterCreateToZero for DA vectors? The DMDA object has more than one degrees of freedom.
> If there isn't any, should I use an IS object to do the scattering?
> Amlan


From thomas.witkowski at tu-dresden.de  Thu Dec 27 10:10:22 2012
From: thomas.witkowski at tu-dresden.de (Thomas Witkowski)
Date: Thu, 27 Dec 2012 17:10:22 +0100
Subject: [petsc-users] LU factorization and solution of
 independent	matrices does not scale, why?
In-Reply-To: <20121221220521.qbp4io8kws040o8g@mail.zih.tu-dresden.de>
References: <50D37234.2040205@tu-dresden.de>
	<4F2AF113-B369-42AD-95B9-3D4C1E8F5CEE@mcs.anl.gov>
	<20121220213950.nyu4ddy1og0kkw8c@mail.zih.tu-dresden.de>
	<CAMYG4G=AGJQzdWwT8Q9C4WCm5X9v_e7dOv9G_1F_eUT6sFuLTA@mail.gmail.com>
	<50D42D82.10603@tu-dresden.de>
	<CAM9tzSk9r-rUSk_0f_vHzNBsTnD9TmR1=58N4XtA6HRdXRQvMg@mail.gmail.com>
	<20121221165112.h5x9cere68sgc488@mail.zih.tu-dresden.de>
	<CAM9tzSmmcH9nB2RKutRRhQKtcd63bVQg3sYPtAGB8THyiFnTdw@mail.gmail.com>
	<20121221220521.qbp4io8kws040o8g@mail.zih.tu-dresden.de>
Message-ID: <50DC72EE.20001@tu-dresden.de>

Have anyone of you tried to reproduce this problem?

Thomas

Am 21.12.2012 22:05, schrieb Thomas Witkowski:
> So, here it is. Just compile and run with
>
> mpiexec -np 64 ./ex10 -ksp_type preonly -pc_type lu 
> -pc_factor_mat_solver_package superlu_dist -log_summary
>
> 64 cores: 0.09 seconds for solving
> 1024 cores: 2.6 seconds for solving
>
> Thomas
>
>
> Zitat von Jed Brown <jedbrown at mcs.anl.gov>:
>
>> Can you reproduce this in a simpler environment so that we can report 
>> it?
>> As I understand your statement, it sounds like you could reproduce by
>> changing src/ksp/ksp/examples/tutorials/ex10.c to create a subcomm of 
>> size
>> 4 and the using that everywhere, then compare log_summary running on 4
>> cores to running on more (despite everything really being independent)
>>
>> It would also be worth using an MPI profiler to see if it's really 
>> spending
>> a lot of time in MPI_Iprobe. Since SuperLU_DIST does not use 
>> MPI_Iprobe, it
>> may be something else.
>>
>> On Fri, Dec 21, 2012 at 8:51 AM, Thomas Witkowski <
>> Thomas.Witkowski at tu-dresden.de> wrote:
>>
>>> I use a modified MPICH version. On the system I use for these 
>>> benchmarks I
>>> cannot use another MPI library.
>>>
>>> I'm not fixed to MUMPS. Superlu_dist, for example, works also perfectly
>>> for this. But there is still the following problem I cannot solve: 
>>> When I
>>> increase the number of coarse space matrices, there seems to be no 
>>> scaling
>>> direct solver for this. Just to summaries:
>>> - one coarse space matrix is created always by one "cluster" 
>>> consisting of
>>> four subdomanins/MPI tasks
>>> - the four tasks are always local to one node, thus inter-node network
>>> communication is not required for computing factorization and solve
>>> - independent of the number of cluster, the coarse space matrices 
>>> are the
>>> same, have the same number of rows, nnz structure but possibly 
>>> different
>>> values
>>> - there is NO load unbalancing
>>> - the matrices must be factorized and there are a lot of solves (> 100)
>>> with them
>>>
>>> It should be pretty clear, that computing LU factorization and solving
>>> with it should scale perfectly. But at the moment, all direct solver I
>>> tried (mumps, superlu_dist, pastix) are not able to scale. The loos of
>>> scale is really worse, as you can see from the numbers I send before.
>>>
>>> Any ideas? Suggestions? Without a scaling solver method for these 
>>> kind of
>>> systems, my multilevel FETI-DP code is just more or less a joke, 
>>> only some
>>> orders of magnitude slower than standard FETI-DP method :)
>>>
>>> Thomas
>>>
>>> Zitat von Jed Brown <jedbrown at mcs.anl.gov>:
>>>
>>>  MUMPS uses MPI_Iprobe on MPI_COMM_WORLD (hard-coded). What MPI
>>>> implementation have you been using? Is the behavior different with a
>>>> different implementation?
>>>>
>>>>
>>>> On Fri, Dec 21, 2012 at 2:36 AM, Thomas Witkowski <
>>>> thomas.witkowski at tu-dresden.de**> wrote:
>>>>
>>>>  Okay, I did a similar benchmark now with PETSc's event logging:
>>>>>
>>>>> UMFPACK
>>>>>  16p: Local solve          350 1.0 2.3025e+01 1.1 5.00e+04 1.0 
>>>>> 0.0e+00
>>>>> 0.0e+00 7.0e+02 63  0  0  0 52  63  0  0  0 51     0
>>>>>  64p: Local solve          350 1.0 2.3208e+01 1.1 5.00e+04 1.0 
>>>>> 0.0e+00
>>>>> 0.0e+00 7.0e+02 60  0  0  0 52  60  0  0  0 51     0
>>>>> 256p: Local solve          350 1.0 2.3373e+01 1.1 5.00e+04 1.0 
>>>>> 0.0e+00
>>>>> 0.0e+00 7.0e+02 49  0  0  0 52  49  0  0  0 51     1
>>>>>
>>>>> MUMPS
>>>>>  16p: Local solve          350 1.0 4.7183e+01 1.1 5.00e+04 1.0 
>>>>> 0.0e+00
>>>>> 0.0e+00 7.0e+02 75  0  0  0 52  75  0  0  0 51     0
>>>>>  64p: Local solve          350 1.0 7.1409e+01 1.1 5.00e+04 1.0 
>>>>> 0.0e+00
>>>>> 0.0e+00 7.0e+02 78  0  0  0 52  78  0  0  0 51     0
>>>>> 256p: Local solve          350 1.0 2.6079e+02 1.1 5.00e+04 1.0 
>>>>> 0.0e+00
>>>>> 0.0e+00 7.0e+02 82  0  0  0 52  82  0  0  0 51     0
>>>>>
>>>>>
>>>>> As you see, the local solves with UMFPACK have nearly constant 
>>>>> time with
>>>>> increasing number of subdomains. This is what I expect. The I replace
>>>>> UMFPACK by MUMPS and I see increasing time for local solves. In 
>>>>> the last
>>>>> columns, UMFPACK has a decreasing value from 63 to 49, while MUMPS's
>>>>> column
>>>>> increases here from 75 to 82. What does this mean?
>>>>>
>>>>> Thomas
>>>>>
>>>>> Am 21.12.2012 02:19, schrieb Matthew Knepley:
>>>>>
>>>>>  On Thu, Dec 20, 2012 at 3:39 PM, Thomas Witkowski
>>>>>
>>>>>> <Thomas.Witkowski at tu-dresden.****de 
>>>>>> <Thomas.Witkowski at tu-dresden.**de<Thomas.Witkowski at tu-dresden.de>
>>>>>> >>
>>>>>>
>>>>>> wrote:
>>>>>>
>>>>>>  I cannot use the information from log_summary, as I have three
>>>>>>> different
>>>>>>> LU
>>>>>>> factorizations and solve (local matrices and two hierarchies of 
>>>>>>> coarse
>>>>>>> grids). Therefore, I use the following work around to get the 
>>>>>>> timing of
>>>>>>> the
>>>>>>> solve I'm intrested in:
>>>>>>>
>>>>>>>  You misunderstand how to use logging. You just put these thing in
>>>>>> separate stages. Stages represent
>>>>>> parts of the code over which events are aggregated.
>>>>>>
>>>>>>     Matt
>>>>>>
>>>>>>       MPI::COMM_WORLD.Barrier();
>>>>>>
>>>>>>>      wtime = MPI::Wtime();
>>>>>>>      KSPSolve(*(data->ksp_schur_****primal_local), tmp_primal,
>>>>>>>
>>>>>>> tmp_primal);
>>>>>>>      FetiTimings::fetiSolve03 += (MPI::Wtime() - wtime);
>>>>>>>
>>>>>>> The factorization is done explicitly before with "KSPSetUp", so 
>>>>>>> I can
>>>>>>> measure the time for LU factorization. It also does not scale! 
>>>>>>> For 64
>>>>>>> cores,
>>>>>>> I takes 0.05 seconds, for 1024 cores 1.2 seconds. In all 
>>>>>>> calculations,
>>>>>>> the
>>>>>>> local coarse space matrices defined on four cores have exactly 
>>>>>>> the same
>>>>>>> number of rows and exactly the same number of non zero entries. So,
>>>>>>> from
>>>>>>> my
>>>>>>> point of view, the time should be absolutely constant.
>>>>>>>
>>>>>>> Thomas
>>>>>>>
>>>>>>> Zitat von Barry Smith <bsmith at mcs.anl.gov>:
>>>>>>>
>>>>>>>
>>>>>>>     Are you timing ONLY the time to factor and solve the 
>>>>>>> subproblems?
>>>>>>>  Or
>>>>>>>
>>>>>>>> also the time to get the data to the collection of 4 cores at 
>>>>>>>> a  time?
>>>>>>>>
>>>>>>>>     If you are only using LU for these problems and not 
>>>>>>>> elsewhere in
>>>>>>>>  the
>>>>>>>> code you can get the factorization and time from MatLUFactor()  
>>>>>>>> and
>>>>>>>> MatSolve() or you can use stages to put this calculation in 
>>>>>>>> its  own
>>>>>>>> stage
>>>>>>>> and use the MatLUFactor() and MatSolve() time from that  stage.
>>>>>>>> Also look at the load balancing column for the factorization and
>>>>>>>>  solve
>>>>>>>> stage, it is well balanced?
>>>>>>>>
>>>>>>>>     Barry
>>>>>>>>
>>>>>>>> On Dec 20, 2012, at 2:16 PM, Thomas Witkowski
>>>>>>>> <thomas.witkowski at tu-dresden.****de 
>>>>>>>> <thomas.witkowski at tu-dresden.**de<thomas.witkowski at tu-dresden.de>
>>>>>>>> >>
>>>>>>>>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>  In my multilevel FETI-DP code, I have localized course matrices,
>>>>>>>>  which
>>>>>>>>
>>>>>>>>> are defined on only a subset of all MPI tasks, typically  
>>>>>>>>> between 4
>>>>>>>>> and 64
>>>>>>>>> tasks. The MatAIJ and the KSP objects are both defined on a MPI
>>>>>>>>> communicator, which is a subset of MPI::COMM_WORLD. The LU
>>>>>>>>> factorization of
>>>>>>>>> the matrices is computed  with either MUMPS or superlu_dist, 
>>>>>>>>> but both
>>>>>>>>> show
>>>>>>>>> some scaling  property I really wonder of: When the overall 
>>>>>>>>> problem
>>>>>>>>> size is
>>>>>>>>> increased, the solve with the LU factorization of the local  
>>>>>>>>> matrices
>>>>>>>>> does
>>>>>>>>> not scale! But why not? I just increase the number of  local
>>>>>>>>> matrices,
>>>>>>>>> but
>>>>>>>>> all of them are independent of each other. Some example: I use 64
>>>>>>>>> cores,
>>>>>>>>> each coarse matrix is spanned by 4 cores  so there are 16 MPI
>>>>>>>>> communicators
>>>>>>>>> with 16 coarse space matrices.  The problem need to solve 192 
>>>>>>>>> times
>>>>>>>>> with the
>>>>>>>>> coarse space systems,  and this takes together 0.09 seconds. 
>>>>>>>>> Now I
>>>>>>>>> increase
>>>>>>>>> the number of  cores to 256, but let the local coarse space be
>>>>>>>>> defined
>>>>>>>>> again
>>>>>>>>> on  only 4 cores. Again, 192 solutions with these coarse 
>>>>>>>>> spaces are
>>>>>>>>> required, but now this takes 0.24 seconds. The same for 1024 
>>>>>>>>> cores,
>>>>>>>>>  and we
>>>>>>>>> are at 1.7 seconds for the local coarse space solver!
>>>>>>>>>
>>>>>>>>> For me, this is a total mystery! Any idea how to explain, 
>>>>>>>>> debug and
>>>>>>>>> eventually how to resolve this problem?
>>>>>>>>>
>>>>>>>>> Thomas
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>> -- 
>>>>>> What most experimenters take for granted before they begin their
>>>>>> experiments is infinitely more interesting than any results to which
>>>>>> their experiments lead.
>>>>>> -- Norbert Wiener
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>
>


From bsmith at mcs.anl.gov  Thu Dec 27 10:40:44 2012
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Thu, 27 Dec 2012 10:40:44 -0600
Subject: [petsc-users] (no subject)
In-Reply-To: <CA+Z=SOrcHHAFn2dgsi+VFwSak8hxE0jcOsD=jiPBoJMOUG0Auw@mail.gmail.com>
References: <CA+Z=SOq4W_CdrM1ZUyOqh1oqWbCVPnohadDQZS=E-no1ZQve1Q@mail.gmail.com>
	<B7D9ADAB-EFD6-4C1E-B41A-1622659D4A59@mcs.anl.gov>
	<CA+Z=SOozdD0ymMzefkw5jc+rx-H0X99pft48L6=DEnxuijEvqQ@mail.gmail.com>
	<CA+Z=SOrcHHAFn2dgsi+VFwSak8hxE0jcOsD=jiPBoJMOUG0Auw@mail.gmail.com>
Message-ID: <118BE79A-7DD9-4D02-8EE7-650010BAF1D2@mcs.anl.gov>


On Dec 27, 2012, at 10:34 AM, amlan barua <abarua at iit.edu> wrote:

> I think I can use VecSetValues, is that right?

   Yes you could do that. But since you are using a DMDA you could also use DMGetLocalVector(), DMGlobalToLocalBegin/End() followed by DMDAVecGetArray() to access the ghost values. 

   Barry

> Amlan
> 
> 
> On Thu, Dec 27, 2012 at 9:04 AM, amlan barua <abarua at iit.edu> wrote:
> Hi Barry,
> Is this scattering a very costly operation? I have to compute x[i] = f(x[i-1]) where f is known. Since this operation is strictly sequential, I thought of gathering the entire vector on processor 0, do the sequential operation there and scatter the result back. However this is unnecessary because I only need the bordering x[i] values. What can be a better way?
> Amlan
> 
> 
> On Thu, Dec 27, 2012 at 8:18 AM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> 
>   ierr = DMDACreateNaturalVector(da,&natural);CHKERRQ(ierr);
>     ierr = DMDAGlobalToNaturalBegin(da,xin,INSERT_VALUES,natural);CHKERRQ(ierr);
>     ierr = DMDAGlobalToNaturalEnd(da,xin,INSERT_VALUES,natural);CHKERRQ(ierr);
> 
> Now do VecScatterCreateToZero() from natural and the vector will be in the natural ordering on process zero with the dof interlaced.
> 
> 
>    Barry
> 
> On Dec 27, 2012, at 12:22 AM, amlan barua <abarua at iit.edu> wrote:
> 
> > Hi,
> > Is there an analogue of VecScatterCreateToZero for DA vectors? The DMDA object has more than one degrees of freedom.
> > If there isn't any, should I use an IS object to do the scattering?
> > Amlan
> 
> 
> 


From jefonseca at gmail.com  Fri Dec 28 10:58:02 2012
From: jefonseca at gmail.com (Jim Fonseca)
Date: Fri, 28 Dec 2012 11:58:02 -0500
Subject: [petsc-users] how to determine if complex matrix has imaginary
	components
Message-ID: <CAOY93O+++Y=cb8hD0R106DLBqNGmCCaPa0+SFRy8-npNNB3Ohw@mail.gmail.com>

Hi,
Is there a computationally fast way to determine if the imaginary
components of a matrix are zero or very small?

Thanks,
Jim


-- 
Jim Fonseca, PhD
Research Scientist
Network for Computational Nanotechnology
Purdue University
765-496-6495
www.jimfonseca.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121228/6d7e34c0/attachment.html>

From knepley at gmail.com  Fri Dec 28 11:37:48 2012
From: knepley at gmail.com (Matthew Knepley)
Date: Fri, 28 Dec 2012 12:37:48 -0500
Subject: [petsc-users] how to determine if complex matrix has imaginary
	components
In-Reply-To: <CAOY93O+++Y=cb8hD0R106DLBqNGmCCaPa0+SFRy8-npNNB3Ohw@mail.gmail.com>
References: <CAOY93O+++Y=cb8hD0R106DLBqNGmCCaPa0+SFRy8-npNNB3Ohw@mail.gmail.com>
Message-ID: <CAMYG4GnPRap11cK6KtMqdfP_dqaDw-ogH9y7DGg4onBwEK3nSQ@mail.gmail.com>

On Fri, Dec 28, 2012 at 11:58 AM, Jim Fonseca <jefonseca at gmail.com> wrote:

> Hi,
> Is there a computationally fast way to determine if the imaginary
> components of a matrix are zero or very small?
>

I can't think of anything faster than checking each entry.

   Matt


> Thanks,
> Jim
>
>
> --
> Jim Fonseca, PhD
> Research Scientist
> Network for Computational Nanotechnology
> Purdue University
> 765-496-6495
> www.jimfonseca.com
>
>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121228/bf5adb13/attachment.html>

From slivkaje at gmail.com  Fri Dec 28 16:01:03 2012
From: slivkaje at gmail.com (Jelena Slivka)
Date: Fri, 28 Dec 2012 23:01:03 +0100
Subject: [petsc-users] (no subject)
Message-ID: <CAE-ErqBVaVWj2kSOdFaaxWO2+WzZL5djswyaret-zQsMOR_Xkg@mail.gmail.com>

Hello!
I have a few simple questions about PETSc about functions that I can't seem
to find in the documentation:
1) Is there a way to automatically create a matrix in which all elements
are the same scalar value a, e.g. something like ones(m,n) in Matlab?
2) Is there an equivalent to Matlab .* operator?
3) Is there a function that can create matrix C by appending matrices A and
B?
Grateful in advance
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121228/46a75300/attachment.html>

From jedbrown at mcs.anl.gov  Fri Dec 28 17:02:47 2012
From: jedbrown at mcs.anl.gov (Jed Brown)
Date: Fri, 28 Dec 2012 17:02:47 -0600
Subject: [petsc-users] (no subject)
In-Reply-To: <CAE-ErqBVaVWj2kSOdFaaxWO2+WzZL5djswyaret-zQsMOR_Xkg@mail.gmail.com>
References: <CAE-ErqBVaVWj2kSOdFaaxWO2+WzZL5djswyaret-zQsMOR_Xkg@mail.gmail.com>
Message-ID: <CAM9tzSk43cX2Qywb7Hdnc20P7-9J6E4+=S_-W7Yks=xnmN4_qQ@mail.gmail.com>

On Fri, Dec 28, 2012 at 4:01 PM, Jelena Slivka <slivkaje at gmail.com> wrote:

> Hello!
> I have a few simple questions about PETSc about functions that I can't
> seem to find in the documentation:
> 1) Is there a way to automatically create a matrix in which all elements
> are the same scalar value a, e.g. something like ones(m,n) in Matlab?
>

That matrix (or any very low rank matrix) should not be stored explicitly
as a dense matrix.

2) Is there an equivalent to Matlab .* operator?
>

There is not a MatPointwiseMult(). It could be added, but I'm not aware of
a use for this operator outside of matrix misuse (using a Mat to represent
an array of numbers that are not an operator, thus should really be a Vec,
perhaps managed using DMDA).


> 3) Is there a function that can create matrix C by appending matrices A
> and B?
> Grateful in advance
>

Block matrices can be manipulated efficiently using MATNEST, but there is a
very high probability of misuse unless you really understand why that is an
appropriate data structure. Much more likely, you should create a matrix of
size C, then assemble the parts of A and B into it, perhaps using
MatGetLocalSubMatrix() so that the assembly "looks" like assembling A and B
separately.

Note that in parallel, you almost never want "concatenation" in the matrix
sense of [A B; C D]. Instead, you want that there is some row and column
permutation in which the operation would be concatenation, but in reality,
the matrices are actually interleaved with some granularity so that both
are well-distributed on the parallel machine.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121228/3bef7bdd/attachment.html>

From abarua at iit.edu  Sat Dec 29 01:40:26 2012
From: abarua at iit.edu (amlan barua)
Date: Sat, 29 Dec 2012 01:40:26 -0600
Subject: [petsc-users] (no subject)
In-Reply-To: <118BE79A-7DD9-4D02-8EE7-650010BAF1D2@mcs.anl.gov>
References: <CA+Z=SOq4W_CdrM1ZUyOqh1oqWbCVPnohadDQZS=E-no1ZQve1Q@mail.gmail.com>
	<B7D9ADAB-EFD6-4C1E-B41A-1622659D4A59@mcs.anl.gov>
	<CA+Z=SOozdD0ymMzefkw5jc+rx-H0X99pft48L6=DEnxuijEvqQ@mail.gmail.com>
	<CA+Z=SOrcHHAFn2dgsi+VFwSak8hxE0jcOsD=jiPBoJMOUG0Auw@mail.gmail.com>
	<118BE79A-7DD9-4D02-8EE7-650010BAF1D2@mcs.anl.gov>
Message-ID: <CA+Z=SOowGe8VdS8zNZoigcVRqfLYiQXCaO24cx5qVWJLcgbcKw@mail.gmail.com>

Hi Barry,
I wrote the following piece according to your suggestions. Currently it
does nothing but creates a vector with 1 at 1th position, 2 at 2th and so
on. But I made it serial, i.e. (n+1)th place is computed using the value of
nth place. My question, did I do it correctly, i.e. is it safe or results
may change depending on problem size? This is much faster than
VecSetValues, I believe the communication is minimum here because I take
the advantage of ghost points.
Amlan

PetscInitialize(&argc,&argv,(char *)0,help);
    ierr = MPI_Comm_size(PETSC_COMM_WORLD, &size); CHKERRQ(ierr);
    ierr = MPI_Comm_rank(PETSC_COMM_WORLD, &rank); CHKERRQ(ierr);
    ierr =
DMDACreate1d(PETSC_COMM_WORLD,DMDA_BOUNDARY_NONE,size*5,1,1,PETSC_NULL,&da);
CHKERRQ(ierr);
    ierr = DMCreateGlobalVector(da,&vec); CHKERRQ(ierr);
    ierr = VecSet(vec,1.00);
    ierr = DMCreateLocalVector(da,&local);
    ierr = DMDAGetLocalInfo(da,&info);
    ierr = DMDAVecGetArray(da,vec,&arr);
    ierr = DMDAVecGetArray(da,local,&array);
    temp = 1;
    for (j=0;j<size;j++) {
       ierr = DMGlobalToLocalBegin(da,vec,INSERT_VALUES,local);
CHKERRQ(ierr);
       ierr = DMGlobalToLocalEnd(da,vec,INSERT_VALUES,local);
CHKERRQ(ierr);
       if (rank==j) {
       for (i=info.xs;i<info.xs+info.xm;i++) {
          if ((!i)==0) {
            array[i] = array[i] + array[i-1];
            arr[i] = array[i];
            }
          }
       }
       }
    ierr = DMDAVecRestoreArray(da,local,&array);
    ierr = DMDAVecRestoreArray(da,vec,&arr);
    ierr = VecView(vec,PETSC_VIEWER_STDOUT_WORLD);
    PetscFinalize();
    return 0;



On Thu, Dec 27, 2012 at 10:40 AM, Barry Smith <bsmith at mcs.anl.gov> wrote:

>
> On Dec 27, 2012, at 10:34 AM, amlan barua <abarua at iit.edu> wrote:
>
> > I think I can use VecSetValues, is that right?
>
>    Yes you could do that. But since you are using a DMDA you could also
> use DMGetLocalVector(), DMGlobalToLocalBegin/End() followed by
> DMDAVecGetArray() to access the ghost values.
>
>    Barry
>
> > Amlan
> >
> >
> > On Thu, Dec 27, 2012 at 9:04 AM, amlan barua <abarua at iit.edu> wrote:
> > Hi Barry,
> > Is this scattering a very costly operation? I have to compute x[i] =
> f(x[i-1]) where f is known. Since this operation is strictly sequential, I
> thought of gathering the entire vector on processor 0, do the sequential
> operation there and scatter the result back. However this is unnecessary
> because I only need the bordering x[i] values. What can be a better way?
> > Amlan
> >
> >
> > On Thu, Dec 27, 2012 at 8:18 AM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> >
> >   ierr = DMDACreateNaturalVector(da,&natural);CHKERRQ(ierr);
> >     ierr =
> DMDAGlobalToNaturalBegin(da,xin,INSERT_VALUES,natural);CHKERRQ(ierr);
> >     ierr =
> DMDAGlobalToNaturalEnd(da,xin,INSERT_VALUES,natural);CHKERRQ(ierr);
> >
> > Now do VecScatterCreateToZero() from natural and the vector will be in
> the natural ordering on process zero with the dof interlaced.
> >
> >
> >    Barry
> >
> > On Dec 27, 2012, at 12:22 AM, amlan barua <abarua at iit.edu> wrote:
> >
> > > Hi,
> > > Is there an analogue of VecScatterCreateToZero for DA vectors? The
> DMDA object has more than one degrees of freedom.
> > > If there isn't any, should I use an IS object to do the scattering?
> > > Amlan
> >
> >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121229/d7785fe3/attachment.html>

From jedbrown at mcs.anl.gov  Sat Dec 29 10:19:50 2012
From: jedbrown at mcs.anl.gov (Jed Brown)
Date: Sat, 29 Dec 2012 10:19:50 -0600
Subject: [petsc-users] (no subject)
In-Reply-To: <CA+Z=SOowGe8VdS8zNZoigcVRqfLYiQXCaO24cx5qVWJLcgbcKw@mail.gmail.com>
References: <CA+Z=SOq4W_CdrM1ZUyOqh1oqWbCVPnohadDQZS=E-no1ZQve1Q@mail.gmail.com>
	<B7D9ADAB-EFD6-4C1E-B41A-1622659D4A59@mcs.anl.gov>
	<CA+Z=SOozdD0ymMzefkw5jc+rx-H0X99pft48L6=DEnxuijEvqQ@mail.gmail.com>
	<CA+Z=SOrcHHAFn2dgsi+VFwSak8hxE0jcOsD=jiPBoJMOUG0Auw@mail.gmail.com>
	<118BE79A-7DD9-4D02-8EE7-650010BAF1D2@mcs.anl.gov>
	<CA+Z=SOowGe8VdS8zNZoigcVRqfLYiQXCaO24cx5qVWJLcgbcKw@mail.gmail.com>
Message-ID: <CAM9tzSnKvcQwOComG99uwm_H2BL1qMxs5HA1M9WRsiKwM2cAwg@mail.gmail.com>

On Sat, Dec 29, 2012 at 1:40 AM, amlan barua <abarua at iit.edu> wrote:

> Hi Barry,
> I wrote the following piece according to your suggestions. Currently it
> does nothing but creates a vector with 1 at 1th position, 2 at 2th and so
> on. But I made it serial, i.e. (n+1)th place is computed using the value of
> nth place. My question, did I do it correctly, i.e. is it safe or results
> may change depending on problem size? This is much faster than
> VecSetValues, I believe the communication is minimum here because I take
> the advantage of ghost points.
> Amlan
>
> PetscInitialize(&argc,&argv,(char *)0,help);
>     ierr = MPI_Comm_size(PETSC_COMM_WORLD, &size); CHKERRQ(ierr);
>     ierr = MPI_Comm_rank(PETSC_COMM_WORLD, &rank); CHKERRQ(ierr);
>     ierr =
> DMDACreate1d(PETSC_COMM_WORLD,DMDA_BOUNDARY_NONE,size*5,1,1,PETSC_NULL,&da);
> CHKERRQ(ierr);
>     ierr = DMCreateGlobalVector(da,&vec); CHKERRQ(ierr);
>     ierr = VecSet(vec,1.00);
>     ierr = DMCreateLocalVector(da,&local);
>     ierr = DMDAGetLocalInfo(da,&info);
>     ierr = DMDAVecGetArray(da,vec,&arr);
>     ierr = DMDAVecGetArray(da,local,&array);
>     temp = 1;
>     for (j=0;j<size;j++) {
>

This is needlessly sequential (or should be).


>        ierr = DMGlobalToLocalBegin(da,vec,INSERT_VALUES,local);
> CHKERRQ(ierr);
>        ierr = DMGlobalToLocalEnd(da,vec,INSERT_VALUES,local);
> CHKERRQ(ierr);
>

You should never use a communication routine while you have access to the
array (*VecGetArray()).


>        if (rank==j) {
>        for (i=info.xs;i<info.xs+info.xm;i++) {
>           if ((!i)==0) {
>             array[i] = array[i] + array[i-1];
>

What sort of recurrence do you actually want to implement. When possible,
it's much better to reorganize so that you can do local work followed by an
MPI_Scan followed by more local work. MPI_Scan is fast (logarithmic).


>             arr[i] = array[i];
>             }
>           }
>        }
>        }
>     ierr = DMDAVecRestoreArray(da,local,&array);
>     ierr = DMDAVecRestoreArray(da,vec,&arr);
>     ierr = VecView(vec,PETSC_VIEWER_STDOUT_WORLD);
>     PetscFinalize();
>     return 0;
>
>
>
> On Thu, Dec 27, 2012 at 10:40 AM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
>>
>> On Dec 27, 2012, at 10:34 AM, amlan barua <abarua at iit.edu> wrote:
>>
>> > I think I can use VecSetValues, is that right?
>>
>>    Yes you could do that. But since you are using a DMDA you could also
>> use DMGetLocalVector(), DMGlobalToLocalBegin/End() followed by
>> DMDAVecGetArray() to access the ghost values.
>>
>>    Barry
>>
>> > Amlan
>> >
>> >
>> > On Thu, Dec 27, 2012 at 9:04 AM, amlan barua <abarua at iit.edu> wrote:
>> > Hi Barry,
>> > Is this scattering a very costly operation? I have to compute x[i] =
>> f(x[i-1]) where f is known. Since this operation is strictly sequential, I
>> thought of gathering the entire vector on processor 0, do the sequential
>> operation there and scatter the result back. However this is unnecessary
>> because I only need the bordering x[i] values. What can be a better way?
>> > Amlan
>> >
>> >
>> > On Thu, Dec 27, 2012 at 8:18 AM, Barry Smith <bsmith at mcs.anl.gov>
>> wrote:
>> >
>> >   ierr = DMDACreateNaturalVector(da,&natural);CHKERRQ(ierr);
>> >     ierr =
>> DMDAGlobalToNaturalBegin(da,xin,INSERT_VALUES,natural);CHKERRQ(ierr);
>> >     ierr =
>> DMDAGlobalToNaturalEnd(da,xin,INSERT_VALUES,natural);CHKERRQ(ierr);
>> >
>> > Now do VecScatterCreateToZero() from natural and the vector will be in
>> the natural ordering on process zero with the dof interlaced.
>> >
>> >
>> >    Barry
>> >
>> > On Dec 27, 2012, at 12:22 AM, amlan barua <abarua at iit.edu> wrote:
>> >
>> > > Hi,
>> > > Is there an analogue of VecScatterCreateToZero for DA vectors? The
>> DMDA object has more than one degrees of freedom.
>> > > If there isn't any, should I use an IS object to do the scattering?
>> > > Amlan
>> >
>> >
>> >
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121229/dea9e298/attachment.html>

From jedbrown at mcs.anl.gov  Sat Dec 29 12:59:54 2012
From: jedbrown at mcs.anl.gov (Jed Brown)
Date: Sat, 29 Dec 2012 12:59:54 -0600
Subject: [petsc-users] Direct Schur complement domain decomposition
In-Reply-To: <002b01cdde05$6a2d0330$3e870990$@tuhh.de>
References: <002b01cdde05$6a2d0330$3e870990$@tuhh.de>
Message-ID: <CAM9tzS=1xu1CWTUEygOiZMitQz3JQ2dEm3=ABd9sfSJkGE36Wg@mail.gmail.com>

Sorry for the slow reply. What you are describing _is_ multifrontal
factorization, or alternatively, (non-iterative) substructuring. It is a
direct solve and boils down to a few large dense direct solves. Incomplete
factorization is one way of preventing the Schur complements from getting
too dense, but it's not very reliable.

There are many other ways of retaining structure in the supernodes (i.e.,
avoid unstructured dense matrices), at the expense of some error. These
methods "compress" the Schur complement using low-rank representations for
long-range interaction. These are typically combined with an iterative
method.

Multigrid and multilevel DD methods can be thought of as an alternate way
to compress (approximately) the long-range interaction coming from inexact
elimination (dimensional reduction of interfaces).

On Wed, Dec 19, 2012 at 10:25 AM, Stefan Kurzbach
<stefan.kurzbach at tuhh.de>wrote:

> Hello everybody,****
>
> ** **
>
> in my recent research on parallelization of a 2D unstructured flow model
> code I came upon a question on domain decomposition techniques in ?grids?.
> Maybe someone knows of any previous results on this?****
>
> ** **
>
> Typically, when doing large simulations with many unknowns, the problem is
> distributed to many computer nodes and solved in parallel by some iterative
> method. Many of these iterative methods boil down to a large number of
> distributed matrix-vector multiplications (in the order of the number of
> iterations). This means there are many synchronization points in the
> algorithms, which makes them tightly coupled. This has been found to work
> well on clusters with fast networks.****
>
> ** **
>
> Now my question:****
>
> What if there is a small number of very powerful nodes (say less than 10),
> which are connected by a slow network, e.g. several computer clusters
> connected over the internet (some people call this ?grid computing?). I
> expect that the traditional iterative methods will not be as efficient here
> (any references?).****
>
> ** **
>
> My guess is that a solution method with fewer synchronization points will
> work better, even though that method may be computationally more expensive
> than traditional methods. An example would be a domain composition approach
> with direct solution of the Schur complement on the interface. This
> requires that the interface size has to be small compared to the subdomain
> size. As this algorithm basically works in three decoupled phases (solve
> the subdomains for several right hand sides, assemble and solve the Schur
> complement system, correct the subdomain results) it should be suited well,
> but I have no idea how to test or otherwise prove it. Has anybody made any
> thoughts on this before, possibly dating back to the 80ies and 90ies, where
> slow networks were more common?****
>
> ** **
>
> Best regards****
>
> Stefan****
>
> ** **
>
> ** **
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121229/8f88887b/attachment.html>

From bsmith at mcs.anl.gov  Sat Dec 29 15:21:00 2012
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Sat, 29 Dec 2012 15:21:00 -0600
Subject: [petsc-users] Direct Schur complement domain decomposition
In-Reply-To: <CAM9tzS=1xu1CWTUEygOiZMitQz3JQ2dEm3=ABd9sfSJkGE36Wg@mail.gmail.com>
References: <002b01cdde05$6a2d0330$3e870990$@tuhh.de>
	<CAM9tzS=1xu1CWTUEygOiZMitQz3JQ2dEm3=ABd9sfSJkGE36Wg@mail.gmail.com>
Message-ID: <9FFBA092-74B0-4CF3-AF65-45A001FDAC2E@mcs.anl.gov>


  My off the cuff response is that "computing the exact Schur complements for the subdomains is sooooo expensive that it swamps out any savings in reducing the amount of communication" plus it requires soooo much memory. Thus solvers like these may make sense only when the problem is "non-standard" enough that iterative methods simply don't work (perhaps due to extreme ill-conditioning), such problems do exist but for most "PDE" problems with enough time and effort one can cook up the right combination of "block-splittings" and multilevel (multigrid) methods to get a much more efficient solver that gives you the accuracy you need long before the Schur complements have been computed.

   Barry

On Dec 29, 2012, at 12:59 PM, Jed Brown <jedbrown at mcs.anl.gov> wrote:

> Sorry for the slow reply. What you are describing _is_ multifrontal factorization, or alternatively, (non-iterative) substructuring. It is a direct solve and boils down to a few large dense direct solves. Incomplete factorization is one way of preventing the Schur complements from getting too dense, but it's not very reliable.
> 
> There are many other ways of retaining structure in the supernodes (i.e., avoid unstructured dense matrices), at the expense of some error. These methods "compress" the Schur complement using low-rank representations for long-range interaction. These are typically combined with an iterative method.
> 
> Multigrid and multilevel DD methods can be thought of as an alternate way to compress (approximately) the long-range interaction coming from inexact elimination (dimensional reduction of interfaces).
> 
> On Wed, Dec 19, 2012 at 10:25 AM, Stefan Kurzbach <stefan.kurzbach at tuhh.de> wrote:
> Hello everybody,
> 
>  
> 
> in my recent research on parallelization of a 2D unstructured flow model code I came upon a question on domain decomposition techniques in ?grids?. Maybe someone knows of any previous results on this?
> 
>  
> 
> Typically, when doing large simulations with many unknowns, the problem is distributed to many computer nodes and solved in parallel by some iterative method. Many of these iterative methods boil down to a large number of distributed matrix-vector multiplications (in the order of the number of iterations). This means there are many synchronization points in the algorithms, which makes them tightly coupled. This has been found to work well on clusters with fast networks.
> 
>  
> 
> Now my question:
> 
> What if there is a small number of very powerful nodes (say less than 10), which are connected by a slow network, e.g. several computer clusters connected over the internet (some people call this ?grid computing?). I expect that the traditional iterative methods will not be as efficient here (any references?).
> 
>  
> 
> My guess is that a solution method with fewer synchronization points will work better, even though that method may be computationally more expensive than traditional methods. An example would be a domain composition approach with direct solution of the Schur complement on the interface. This requires that the interface size has to be small compared to the subdomain size. As this algorithm basically works in three decoupled phases (solve the subdomains for several right hand sides, assemble and solve the Schur complement system, correct the subdomain results) it should be suited well, but I have no idea how to test or otherwise prove it. Has anybody made any thoughts on this before, possibly dating back to the 80ies and 90ies, where slow networks were more common?
> 
>  
> 
> Best regards
> 
> Stefan
> 
>  
> 
>  
> 
> 


From slivkaje at gmail.com  Sat Dec 29 21:41:47 2012
From: slivkaje at gmail.com (Jelena Slivka)
Date: Sun, 30 Dec 2012 04:41:47 +0100
Subject: [petsc-users] MatAXPY Segmentation violation
Message-ID: <CAE-ErqCuEBAKuUKkBSbR6H45t1csnNH3SF9poQGbG+c_gEnTdw@mail.gmail.com>

Hello,

I am experiencing the strange behavior when calling the MatAXPY function.
Here is my code:
matrix similarity is a square matrix (n=m)
I create the matrix aux that has all zero elements, except for the
diagonal. The elements of the diagonal in matrix aux are sums of rows in
matrix similarity.

    MatSetFromOptions(similarity);
    int n, m;
    MatGetSize(similarity, &n, &m);

    Vec tmp;
    VecCreate(PETSC_COMM_WORLD, &tmp);
    VecSetSizes(tmp, PETSC_DECIDE, n);
    VecSetFromOptions(tmp);
    MatGetRowSum(similarity, tmp);

    Mat aux;
    MatCreate(PETSC_COMM_WORLD, &aux);
    MatSetSizes(aux, PETSC_DECIDE, PETSC_DECIDE, n, m);
    MatSetFromOptions(aux);
    MatSetUp(aux);
    MatZeroEntries(aux);
    MatDiagonalSet(aux, tmp, INSERT_VALUES);
    VecDestroy(&tmp);

    MatAXPY(aux, -1, similarity, DIFFERENT_NONZERO_PATTERN);

If I execute this code using only one process I get the segmentation
violation error:

[0]PETSC ERROR:
------------------------------------------------------------------------
[0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,
probably memory access out of range
[0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[0]PETSC ERROR: or see
http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[0]PETSC ERROR:
or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory
corruption errors
[0]PETSC ERROR: likely location of problem given in stack below
[0]PETSC ERROR: ---------------------  Stack Frames
------------------------------------
[0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
[0]PETSC ERROR:       INSTEAD the line number of the start of the function
[0]PETSC ERROR:       is given.
[0]PETSC ERROR: [0] MatAXPYGetPreallocation_SeqAIJ line 2562
src/mat/impls/aij/seq/aij.c
[0]PETSC ERROR: [0] MatAXPY_SeqAIJ line 2587 src/mat/impls/aij/seq/aij.c
[0]PETSC ERROR: [0] MatAXPY line 29 src/mat/utils/axpy.c
[0]PETSC ERROR: --------------------- Error Message
------------------------------------
[0]PETSC ERROR: Signal received!

However, if I run the same code using two processes it runs ok and gives
the good result.
Could you please tell me what am I doing wrong?
Grateful in advance
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121230/f1617172/attachment.html>

From jedbrown at mcs.anl.gov  Sat Dec 29 21:52:33 2012
From: jedbrown at mcs.anl.gov (Jed Brown)
Date: Sat, 29 Dec 2012 21:52:33 -0600
Subject: [petsc-users] MatAXPY Segmentation violation
In-Reply-To: <CAE-ErqCuEBAKuUKkBSbR6H45t1csnNH3SF9poQGbG+c_gEnTdw@mail.gmail.com>
References: <CAE-ErqCuEBAKuUKkBSbR6H45t1csnNH3SF9poQGbG+c_gEnTdw@mail.gmail.com>
Message-ID: <CAM9tzS=k-To3K=ZL+s__hawaq6MYEjM6U7y5Yw_yn4LNMJXHWA@mail.gmail.com>

You might rather use MatDuplicate(similarity,MAT_DO_NOT_COPY_VALUES,&aux).

Can you try these?

1. using the debugger to get a stack trace
2. run in valgrind to check for memory errors
3. set up a test case so we can reproduce


On Sat, Dec 29, 2012 at 9:41 PM, Jelena Slivka <slivkaje at gmail.com> wrote:

> Hello,
>
> I am experiencing the strange behavior when calling the MatAXPY function.
> Here is my code:
> matrix similarity is a square matrix (n=m)
> I create the matrix aux that has all zero elements, except for the
> diagonal. The elements of the diagonal in matrix aux are sums of rows in
> matrix similarity.
>
>     MatSetFromOptions(similarity);
>     int n, m;
>     MatGetSize(similarity, &n, &m);
>
>     Vec tmp;
>     VecCreate(PETSC_COMM_WORLD, &tmp);
>     VecSetSizes(tmp, PETSC_DECIDE, n);
>     VecSetFromOptions(tmp);
>     MatGetRowSum(similarity, tmp);
>
>     Mat aux;
>     MatCreate(PETSC_COMM_WORLD, &aux);
>     MatSetSizes(aux, PETSC_DECIDE, PETSC_DECIDE, n, m);
>     MatSetFromOptions(aux);
>     MatSetUp(aux);
>     MatZeroEntries(aux);
>     MatDiagonalSet(aux, tmp, INSERT_VALUES);
>     VecDestroy(&tmp);
>
>     MatAXPY(aux, -1, similarity, DIFFERENT_NONZERO_PATTERN);
>
> If I execute this code using only one process I get the segmentation
> violation error:
>
> [0]PETSC ERROR:
> ------------------------------------------------------------------------
> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,
> probably memory access out of range
> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
> [0]PETSC ERROR: or see
> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[0]PETSCERROR: or try
> http://valgrind.org on GNU/linux and Apple Mac OS X to find memory
> corruption errors
> [0]PETSC ERROR: likely location of problem given in stack below
> [0]PETSC ERROR: ---------------------  Stack Frames
> ------------------------------------
> [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not
> available,
> [0]PETSC ERROR:       INSTEAD the line number of the start of the function
> [0]PETSC ERROR:       is given.
> [0]PETSC ERROR: [0] MatAXPYGetPreallocation_SeqAIJ line 2562
> src/mat/impls/aij/seq/aij.c
> [0]PETSC ERROR: [0] MatAXPY_SeqAIJ line 2587 src/mat/impls/aij/seq/aij.c
> [0]PETSC ERROR: [0] MatAXPY line 29 src/mat/utils/axpy.c
> [0]PETSC ERROR: --------------------- Error Message
> ------------------------------------
> [0]PETSC ERROR: Signal received!
>
> However, if I run the same code using two processes it runs ok and gives
> the good result.
> Could you please tell me what am I doing wrong?
> Grateful in advance
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121229/b8d2c2e9/attachment.html>

From slivkaje at gmail.com  Sat Dec 29 22:04:09 2012
From: slivkaje at gmail.com (Jelena Slivka)
Date: Sun, 30 Dec 2012 05:04:09 +0100
Subject: [petsc-users] MatAXPY Segmentation violation
In-Reply-To: <CAM9tzS=k-To3K=ZL+s__hawaq6MYEjM6U7y5Yw_yn4LNMJXHWA@mail.gmail.com>
References: <CAE-ErqCuEBAKuUKkBSbR6H45t1csnNH3SF9poQGbG+c_gEnTdw@mail.gmail.com>
	<CAM9tzS=k-To3K=ZL+s__hawaq6MYEjM6U7y5Yw_yn4LNMJXHWA@mail.gmail.com>
Message-ID: <CAE-ErqB2WZyewAqmvpxmsv2QDQgqJS9OX1B+isq4xykxRwfjDg@mail.gmail.com>

Thank you very much! Using MatDuplicate solved the problem.




On Sun, Dec 30, 2012 at 4:52 AM, Jed Brown <jedbrown at mcs.anl.gov> wrote:

> You might rather use MatDuplicate(similarity,MAT_DO_NOT_COPY_VALUES,&aux).
>
> Can you try these?
>
> 1. using the debugger to get a stack trace
> 2. run in valgrind to check for memory errors
> 3. set up a test case so we can reproduce
>
>
>
> On Sat, Dec 29, 2012 at 9:41 PM, Jelena Slivka <slivkaje at gmail.com> wrote:
>
>> Hello,
>>
>> I am experiencing the strange behavior when calling the MatAXPY function.
>> Here is my code:
>> matrix similarity is a square matrix (n=m)
>> I create the matrix aux that has all zero elements, except for the
>> diagonal. The elements of the diagonal in matrix aux are sums of rows in
>> matrix similarity.
>>
>>     MatSetFromOptions(similarity);
>>     int n, m;
>>     MatGetSize(similarity, &n, &m);
>>
>>     Vec tmp;
>>     VecCreate(PETSC_COMM_WORLD, &tmp);
>>     VecSetSizes(tmp, PETSC_DECIDE, n);
>>     VecSetFromOptions(tmp);
>>     MatGetRowSum(similarity, tmp);
>>
>>     Mat aux;
>>     MatCreate(PETSC_COMM_WORLD, &aux);
>>     MatSetSizes(aux, PETSC_DECIDE, PETSC_DECIDE, n, m);
>>     MatSetFromOptions(aux);
>>     MatSetUp(aux);
>>     MatZeroEntries(aux);
>>     MatDiagonalSet(aux, tmp, INSERT_VALUES);
>>     VecDestroy(&tmp);
>>
>>     MatAXPY(aux, -1, similarity, DIFFERENT_NONZERO_PATTERN);
>>
>> If I execute this code using only one process I get the segmentation
>> violation error:
>>
>> [0]PETSC ERROR:
>> ------------------------------------------------------------------------
>> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,
>> probably memory access out of range
>> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
>> [0]PETSC ERROR: or see
>> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[0]PETSCERROR: or try
>> http://valgrind.org on GNU/linux and Apple Mac OS X to find memory
>> corruption errors
>> [0]PETSC ERROR: likely location of problem given in stack below
>> [0]PETSC ERROR: ---------------------  Stack Frames
>> ------------------------------------
>> [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not
>> available,
>> [0]PETSC ERROR:       INSTEAD the line number of the start of the function
>> [0]PETSC ERROR:       is given.
>> [0]PETSC ERROR: [0] MatAXPYGetPreallocation_SeqAIJ line 2562
>> src/mat/impls/aij/seq/aij.c
>> [0]PETSC ERROR: [0] MatAXPY_SeqAIJ line 2587 src/mat/impls/aij/seq/aij.c
>> [0]PETSC ERROR: [0] MatAXPY line 29 src/mat/utils/axpy.c
>> [0]PETSC ERROR: --------------------- Error Message
>> ------------------------------------
>> [0]PETSC ERROR: Signal received!
>>
>> However, if I run the same code using two processes it runs ok and gives
>> the good result.
>> Could you please tell me what am I doing wrong?
>> Grateful in advance
>>
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121230/d077dfbf/attachment-0001.html>

From jedbrown at mcs.anl.gov  Sat Dec 29 23:32:04 2012
From: jedbrown at mcs.anl.gov (Jed Brown)
Date: Sat, 29 Dec 2012 23:32:04 -0600
Subject: [petsc-users] (no subject)
In-Reply-To: <CA+Z=SOqdOyFv65jySuRbjSHUcZh_n=WPxXyucVtB8eH1vEye2g@mail.gmail.com>
References: <CA+Z=SOq4W_CdrM1ZUyOqh1oqWbCVPnohadDQZS=E-no1ZQve1Q@mail.gmail.com>
	<B7D9ADAB-EFD6-4C1E-B41A-1622659D4A59@mcs.anl.gov>
	<CA+Z=SOozdD0ymMzefkw5jc+rx-H0X99pft48L6=DEnxuijEvqQ@mail.gmail.com>
	<CA+Z=SOrcHHAFn2dgsi+VFwSak8hxE0jcOsD=jiPBoJMOUG0Auw@mail.gmail.com>
	<118BE79A-7DD9-4D02-8EE7-650010BAF1D2@mcs.anl.gov>
	<CA+Z=SOowGe8VdS8zNZoigcVRqfLYiQXCaO24cx5qVWJLcgbcKw@mail.gmail.com>
	<CAM9tzSnKvcQwOComG99uwm_H2BL1qMxs5HA1M9WRsiKwM2cAwg@mail.gmail.com>
	<CA+Z=SOqdOyFv65jySuRbjSHUcZh_n=WPxXyucVtB8eH1vEye2g@mail.gmail.com>
Message-ID: <CAM9tzSnHD-L5PCyvjX2v+o=mBxZa_vVcZJRqN78PpN7_W4=nAQ@mail.gmail.com>

On Sat, Dec 29, 2012 at 11:15 PM, amlan barua <abarua at iit.edu> wrote:

> Hi,
> I am actually trying to implement a 'parallel' ordinary differential
> equation solver.
> For proper functioning of the algorithm (the name is parareal), I need to
> implement a simple recurrence relation of the form x[i+1] = f(x[i]), f
> known, depends on quadrature one would like to use.
> What is the best way to implement a sequential operation on a parallel
> structure?
> Somehow I need to keep all but one process idle which.
>

It's not very parallel when all but one process is idle. ;-D


> So I wrote a loop over all processes and within the loop I forced only one
> process to update its part. Say  when j=0 ideally only the processor 0
> should work. But others will update their local j value before 0th
> processor much faster. Thus I am concerned about the safety of the
> operation. Will it be okay if I modify my code as following, please advice?
>
>

Yes, this works, but the performance won't be great when using
DMGlobalToLocal despite only really wanting to send the update to the next
process in the sequence. (Maybe you don't care about performance yet, or
this particular part won't be performance-sensitive.)


>  PetscInitialize(&argc,&argv,(char *)0,help);
>     ierr = MPI_Comm_size(PETSC_COMM_WORLD, &size); CHKERRQ(ierr);
>     ierr = MPI_Comm_rank(PETSC_COMM_WORLD, &rank); CHKERRQ(ierr);
>     ierr =
> DMDACreate1d(PETSC_COMM_WORLD,DMDA_BOUNDARY_NONE,size*5,1,1,PETSC_NULL,&da);
> CHKERRQ(ierr);
>     ierr = DMCreateGlobalVector(da,&vec); CHKERRQ(ierr);
>     ierr = VecSet(vec,1.00);
>     ierr = DMCreateLocalVector(da,&local);
>     ierr = DMDAGetLocalInfo(da,&info);
>     temp = 1;
>     for (j=0;j<size;j++) {
>        ierr = DMGlobalToLocalBegin(da,vec,INSERT_VALUES,local);
> CHKERRQ(ierr);
>        ierr = DMGlobalToLocalEnd(da,vec,INSERT_VALUES,local);
> CHKERRQ(ierr);
>        ierr = DMDAVecGetArray(da,vec,&arr);
>        ierr = DMDAVecGetArray(da,local,&array);
>        if (rank==j) {
>        for (i=info.xs;i<info.xs+info.xm;i++) {
>           if ((!i)==0) {
>

I would write if (i) unless I was intentionally trying to confuse the
reader.


>             array[i] = array[i] + array[i-1];
>             arr[i] = array[i];
>             }
>           }
>        }
>     ierr = DMDAVecRestoreArray(da,local,&array);
>     ierr = DMDAVecRestoreArray(da,vec,&arr);
>    }
>     ierr = VecView(vec,PETSC_VIEWER_STDOUT_WORLD);
>     PetscFinalize();
>     return 0;
> Amlan
>
>
> On Sat, Dec 29, 2012 at 10:19 AM, Jed Brown <jedbrown at mcs.anl.gov> wrote:
>
>> On Sat, Dec 29, 2012 at 1:40 AM, amlan barua <abarua at iit.edu> wrote:
>>
>>> Hi Barry,
>>> I wrote the following piece according to your suggestions. Currently it
>>> does nothing but creates a vector with 1 at 1th position, 2 at 2th and so
>>> on. But I made it serial, i.e. (n+1)th place is computed using the value of
>>> nth place. My question, did I do it correctly, i.e. is it safe or results
>>> may change depending on problem size? This is much faster than
>>> VecSetValues, I believe the communication is minimum here because I take
>>> the advantage of ghost points.
>>> Amlan
>>>
>>> PetscInitialize(&argc,&argv,(char *)0,help);
>>>     ierr = MPI_Comm_size(PETSC_COMM_WORLD, &size); CHKERRQ(ierr);
>>>     ierr = MPI_Comm_rank(PETSC_COMM_WORLD, &rank); CHKERRQ(ierr);
>>>     ierr =
>>> DMDACreate1d(PETSC_COMM_WORLD,DMDA_BOUNDARY_NONE,size*5,1,1,PETSC_NULL,&da);
>>> CHKERRQ(ierr);
>>>     ierr = DMCreateGlobalVector(da,&vec); CHKERRQ(ierr);
>>>     ierr = VecSet(vec,1.00);
>>>     ierr = DMCreateLocalVector(da,&local);
>>>     ierr = DMDAGetLocalInfo(da,&info);
>>>     ierr = DMDAVecGetArray(da,vec,&arr);
>>>     ierr = DMDAVecGetArray(da,local,&array);
>>>     temp = 1;
>>>     for (j=0;j<size;j++) {
>>>
>>
>> This is needlessly sequential (or should be).
>>
>>
>>>        ierr = DMGlobalToLocalBegin(da,vec,INSERT_VALUES,local);
>>> CHKERRQ(ierr);
>>>        ierr = DMGlobalToLocalEnd(da,vec,INSERT_VALUES,local);
>>> CHKERRQ(ierr);
>>>
>>
>> You should never use a communication routine while you have access to the
>> array (*VecGetArray()).
>>
>>
>>>        if (rank==j) {
>>>        for (i=info.xs;i<info.xs+info.xm;i++) {
>>>           if ((!i)==0) {
>>>             array[i] = array[i] + array[i-1];
>>>
>>
>> What sort of recurrence do you actually want to implement. When possible,
>> it's much better to reorganize so that you can do local work followed by an
>> MPI_Scan followed by more local work. MPI_Scan is fast (logarithmic).
>>
>>
>>>             arr[i] = array[i];
>>>             }
>>>           }
>>>        }
>>>        }
>>>     ierr = DMDAVecRestoreArray(da,local,&array);
>>>     ierr = DMDAVecRestoreArray(da,vec,&arr);
>>>     ierr = VecView(vec,PETSC_VIEWER_STDOUT_WORLD);
>>>     PetscFinalize();
>>>     return 0;
>>>
>>>
>>>
>>> On Thu, Dec 27, 2012 at 10:40 AM, Barry Smith <bsmith at mcs.anl.gov>wrote:
>>>
>>>>
>>>> On Dec 27, 2012, at 10:34 AM, amlan barua <abarua at iit.edu> wrote:
>>>>
>>>> > I think I can use VecSetValues, is that right?
>>>>
>>>>    Yes you could do that. But since you are using a DMDA you could also
>>>> use DMGetLocalVector(), DMGlobalToLocalBegin/End() followed by
>>>> DMDAVecGetArray() to access the ghost values.
>>>>
>>>>    Barry
>>>>
>>>> > Amlan
>>>> >
>>>> >
>>>> > On Thu, Dec 27, 2012 at 9:04 AM, amlan barua <abarua at iit.edu> wrote:
>>>> > Hi Barry,
>>>> > Is this scattering a very costly operation? I have to compute x[i] =
>>>> f(x[i-1]) where f is known. Since this operation is strictly sequential, I
>>>> thought of gathering the entire vector on processor 0, do the sequential
>>>> operation there and scatter the result back. However this is unnecessary
>>>> because I only need the bordering x[i] values. What can be a better way?
>>>> > Amlan
>>>> >
>>>> >
>>>> > On Thu, Dec 27, 2012 at 8:18 AM, Barry Smith <bsmith at mcs.anl.gov>
>>>> wrote:
>>>> >
>>>> >   ierr = DMDACreateNaturalVector(da,&natural);CHKERRQ(ierr);
>>>> >     ierr =
>>>> DMDAGlobalToNaturalBegin(da,xin,INSERT_VALUES,natural);CHKERRQ(ierr);
>>>> >     ierr =
>>>> DMDAGlobalToNaturalEnd(da,xin,INSERT_VALUES,natural);CHKERRQ(ierr);
>>>> >
>>>> > Now do VecScatterCreateToZero() from natural and the vector will be
>>>> in the natural ordering on process zero with the dof interlaced.
>>>> >
>>>> >
>>>> >    Barry
>>>> >
>>>> > On Dec 27, 2012, at 12:22 AM, amlan barua <abarua at iit.edu> wrote:
>>>> >
>>>> > > Hi,
>>>> > > Is there an analogue of VecScatterCreateToZero for DA vectors? The
>>>> DMDA object has more than one degrees of freedom.
>>>> > > If there isn't any, should I use an IS object to do the scattering?
>>>> > > Amlan
>>>> >
>>>> >
>>>> >
>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121229/88a1bb63/attachment.html>

From jedbrown at mcs.anl.gov  Mon Dec 31 00:06:58 2012
From: jedbrown at mcs.anl.gov (Jed Brown)
Date: Mon, 31 Dec 2012 00:06:58 -0600
Subject: [petsc-users] PetscKernel_A_gets_inverse_A_
In-Reply-To: <50D42C0F.2090301@unibas.it>
References: <50D42C0F.2090301@unibas.it>
Message-ID: <CAM9tzSmdCt=Y=Y1sXppj71CrhjdXYFFapXSnRkiMc_81uOWZuw@mail.gmail.com>

Sorry about the slow response. Do you only want the square kernels with
static (compile-time) size? These are currently implemented as macros, but
it's not a problem to modify them to be functions. C99 provides a
well-defined mechanism to have a single version with external linkage, but
also encourage inlining.

I'm considering providing a set of kernels that would support non-square
matrices, but the naming conventions would have to change and my
applications would frequently not know the size statically (but it would
typically be ~10 or less, so calling BLAS doesn't make sense).


On Fri, Dec 21, 2012 at 3:29 AM, Aldo Bonfiglioli <
aldo.bonfiglioli at unibas.it> wrote:

> Dear all,
> would it be possible to have a unified interface (also Fortran callable)
> to the PetscKernel_A_gets_inverse_A_ routines?
> I find them very useful within my own piece
> of Fortran code to solve small dense linear system (which I have
> to do very frequently).
> I have my own interface, at present, but I need to
> change it as needed when a new PETSc version is released.
>
> Regards,
> Aldo
> --
> Dr. Aldo Bonfiglioli
> Associate professor of Fluid Flow Machinery
> Scuola di Ingegneria
> Universita' della Basilicata
> V.le dell'Ateneo lucano, 10 85100 Potenza ITALY
> tel:+39.0971.205203 fax:+39.0971.205215
>
>
> Publications list <http://publicationslist.org/aldo.bonfiglioli>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121231/00889033/attachment.html>