[petsc-users] Slow MatAssembly and MatMat Mult
Bikash Kanungo
bikash at umich.edu
Wed Feb 17 21:52:44 CST 2016
Thanks a lot Barry. I guess it's time to bite the bullet.
Regards,
Bikash
On Wed, Feb 17, 2016 at 10:30 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
> You need to bite the bullet and do the communication needed to get the
> info on the sub communicator and then get the result back out to the entire
> communicator.
>
> If the two matrices are generated on MPI_COMM_WORLD you can use
> MatCreateRedundantMatrix() to get entire copies of them on sub
> communicators. You can actually then have each sub communicator do a
> certain number of multiplies of the sparse matrix with different columns of
> the dense matrix so that instead of having one sub communicator do 2500
> sparse matrix vector products you can have each sub communicator (say you
> have 5 of them) do 500 sparse matrix vector products (giving two levels of
> parallelism). The results are dense matrices so you would need to write
> some code to get the parts of the resulting dense matrices back to the
> processes where you want it. I would suggest using MPI calls directly for
> this, I don't PETSc has anything particularly useful to do that.
>
> Barry
>
>
> > On Feb 17, 2016, at 9:17 PM, Bikash Kanungo <bikash at umich.edu> wrote:
> >
> > Hi Barry,
> >
> > I had thought of using sub-communicator for these operations. But the
> matrix entries have contributions from all the processors. Moreover, after
> the MatMatMult operation is done, I need to retrieve certain values from
> the resultant matrix through non-local calls (MatGetSubMatrices) on all
> processors. Defining these matrices to be residing on sub-communicator will
> prohibit me from adding contributions and calling MatGetSubMatrices from
> processors not within the sub-communicator. What would be a good workaround
> for it?
> >
> > Regards,
> > Bikash
> >
> > On Wed, Feb 17, 2016 at 9:33 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> >
> > This is an absolutely tiny problem for 480 processors I am not
> surprised by the terrible performance. You should run this sub calculation
> on a small subset of the processes.
> >
> > Barry
> >
> > > On Feb 17, 2016, at 7:03 PM, Bikash Kanungo <bikash at umich.edu> wrote:
> > >
> > > Hi,
> > >
> > > I have two small (2500x2500) matrices parallelized across 480
> processors. One of them is an MPIAIJ matrix while the other is an MPIDENSE
> matrix. I perform a MatMatMult involving these two matrices. I tried these
> operations on two machines, one is the local cluster at University of
> Michigan and the other is the XSEDE Comet machine. The Comet machine takes
> 10-20 times more time in steps involving MatAssembly and MatMatMult of the
> aforementioned matrices. I have other Petsc MatMult operations in the same
> code involving larger matrices (4 million x 4 million) which show similar
> timing on both machines. It's just those small parallel matrices that are
> inconsistent in terms of their timings. I used same the compilers and MPI
> libraries in both the machines except that I have suppressed "avx2" flag in
> Comet. I believe avx2 affects floating point operations and not
> communication. I would like to know what might be causing these
> inconsistencies only in case of the small matrices. Are there any network
> settings that I can look into and compare?
> > >
> > > Regards,
> > > Bikash
> > >
> > > --
> > > Bikash S. Kanungo
> > > PhD Student
> > > Computational Materials Physics Group
> > > Mechanical Engineering
> > > University of Michigan
> > >
> >
> >
> >
> >
> > --
> > Bikash S. Kanungo
> > PhD Student
> > Computational Materials Physics Group
> > Mechanical Engineering
> > University of Michigan
> >
>
>
--
Bikash S. Kanungo
PhD Student
Computational Materials Physics Group
Mechanical Engineering
University of Michigan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160217/61336c30/attachment.html>
More information about the petsc-users
mailing list