[petsc-dev] no petsc on Edison

Mon Feb 27 14:31:25 CST 2017

   Hong,

  Very nice, thanks!

    Barry

> On Feb 27, 2017, at 2:06 PM, Hong <hzhang at mcs.anl.gov> wrote:
> 
> Mark,
> I implemented scalable MatPtAP and did comparisons of three implementations using ex56.c on alcf cetus machine (this machine has small memory, 1GB/core):
> - nonscalable PtAP: use an array of length PN to do dense axpy
> - scalable PtAP:       do sparse axpy without use of PN array
> - hypre PtAP.
> 
> The results are attached. Summary:
> - nonscalable PtAP is 2x faster than scalable, 8x faster than hypre PtAP
> - scalable PtAP is 4x faster than hypre PtAP
> - hypre uses less memory (see job.ne399.n63.np1000.sh)
> 
> Based on above observation, I set the default PtAP algorithm as 'nonscalable'. 
> When PN > local estimated nonzero of C=PtAP, then switch default to 'scalable'.
> User can overwrite default.
> 
> For the case of np=8000, ne=599 (see job.ne599.n500.np8000.sh), I get
> MatPtAP                   3.6224e+01 (nonscalable for small mats, scalable for larger ones)
> scalable MatPtAP     4.6129e+01
> hypre                        1.9389e+02 
> 
> I'm merging this work to next, then to master soon. 
> 
> Hong          
> 
> On Wed, Feb 8, 2017 at 7:25 PM, Mark Adams <mfadams at lbl.gov> wrote:
> Thanks Hong, that sounds great.
> 
> I am weary of silent optimizations like you suggest but 2x is big! and failure is very bad. So I would vote for your suggestion.
> 
> ex56 is an elasticity problem. It would be nice, now that you have this experimental setup in place, to compare with hypre on a 3D scalar problem. Hypre might not spend much effort optimizing for block matrices. 3x better than hypre seems large to me.  I have to suspect that hypre does not exploit blocked matrices as much as we do.
> 
> Thanks again,
> 
> 
> 
> On Wed, Feb 8, 2017 at 12:07 PM, Hong <hzhang at mcs.anl.gov> wrote:
> I conducted tests on MatMatMult() and MatPtAP() using petsc/src/ksp/ksp/examples/tutorials/ex56.c (gamg) on a 8-core machine (petsc machine). The output file is attached.
> 
> Summary:
> 1) non-scalable MatMatMult() for mpiaij format is 2x faster than scalable version. The major difference between the two is dense-axpy vs. sparse-axpy.
> 
> Currently, we set non-scalable as default, which leads to problem when running large problems. 
> How about setting default as 
>  - non-scalable for small to medium size matrices
>  - scalable for larger ones, e.g.
> 
> +    ierr = PetscOptionsEList("-matmatmult_via","Algorithmic approach","MatMatMult",algTypes,nalg,algTypes[1],&alg,&flg);
> 
> +    if (!flg) { /* set default algorithm based on B->cmap->N */
> +      PetscMPIInt size;
> +      ierr = MPI_Comm_size(comm,&size);CHKERRQ(ierr);
> +      if ((PetscReal)(B->cmap->N)/size > 100000.0) alg = 0; /* scalable algorithm */
> +    }
> 
> i.e., if user does NOT pick an algorithm, when ave cols per process > 100k, use scalable implementation; otherwise, non-scalable version.
> 
> 2) We do NOT have scalable implementation for MatPtAP() yet.
> We have non-scalable PtAP and interface to Hypre's PtAP. Comparing the two, 
> Petsc MatPtAP() is approx 3x faster than Hypre's.
> 
> I'm writing a scalable MatPtAP() now.
> 
> Hong
> 
> On Thu, Feb 2, 2017 at 2:54 PM, Stefano Zampini <stefano.zampini at gmail.com> wrote:
> 
> 
> Il 02 Feb 2017 23:43, "Mark Adams" <mfadams at lbl.gov> ha scritto:
> 
> 
> On Thu, Feb 2, 2017 at 12:02 PM, Stefano Zampini <stefano.zampini at gmail.com> wrote:
> Mark,
> 
> I saw your configuration has hypre. If you could run with master, you may try -matptap_via hypre.
> 
> This is worth trying. Does this even work with GAMG?
> 
> Yes, it should work, except that the block sizes, if any, are not propagated to the resulting matrix. I can add it if you need it.
> 
> 
> 
> Treb: try hypre anyway. It has its own RAP code.
>  
> 
> With that option, you will use hypre's RAP with MATAIJ
> 
> 
> It uses BoomerAMGBuildCoarseOperator directly with the AIJ matrices.
> 
> Stefano
> 
>> On Feb 2, 2017, at 7:28 PM, Mark Adams <mfadams at lbl.gov> wrote:
>> 
>> 
>> 
>> On Thu, Feb 2, 2017 at 11:13 AM, Hong <hzhang at mcs.anl.gov> wrote:
>> Mark:
>> Try '-matmatmult_via scalable' first. If this works, should we set it as default?
>> 
>> If it is robust I would say yes unless it is noticeably slower (say >20%) small scale problems. 
>> 
> 
> 
> 
> 
> 
> 
> <out_ex56_cetus_short>