[petsc-users] Avoiding malloc overhead for unstructured finite element meshes

Thu Jun 28 10:36:44 CDT 2012

>From petsc_info.log:

0: [0] MatStashScatterBegin_Private(): No of messages: 0
0: [0] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
1: [1] MatAssemblyBegin_MPIAIJ(): Stash has 645696 entries, uses 6 mallocs.

This means that rank 1 generates a bunch of entries in rows owned by rank
0, but not vice-versa. The number of entries is somewhat high, but not
unreasonable.

1: [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 334296 X 334296; storage
space: 0 unneeded,25888950 used
1: [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
1: [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 81
1: [1] Mat_CheckInode(): Found 111432 nodes of 334296. Limit used: 5. Using
Inode routines

Rank 1 preallocated correctly, no problem here.

0: [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 346647 X 346647; storage
space: 9900 unneeded,26861247 used
0: [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is
1608

This number of mallocs is the real problem, you have not preallocated
correctly for the "diagonal" block of the matrix on rank 0. Fix
preallocation and it will be fast. Everything below is fine.

0: [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 81
0: [0] Mat_CheckInode(): Found 115549 nodes of 346647. Limit used: 5. Using
Inode routines
0: [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689
-2080374783
0: [0] MatSetUpMultiply_MPIAIJ(): Using block index set to define scatter
1: [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689
-2080374783
1: [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689
-2080374783
0: [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689
-2080374783
0: [0] VecScatterCreateCommon_PtoS(): Using blocksize 3 scatter
0: [0] VecScatterCreate(): Special case: blocked indices to stride
0: [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 346647 X 12234; storage space:
0 unneeded,308736 used
0: [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
0: [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 51
1: [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 334296 X 12210; storage space:
0 unneeded,308736 used
1: [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
1: [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 45

On Thu, Jun 28, 2012 at 5:59 AM, Thomas DE-SOZA <thomas.de-soza at edf.fr>wrote:

>
> Dear PETSc Users,
>
> We're experiencing performances issues after having switched to fully
> distributed meshes in our in-house code and would like your opinion on the
> matter.
>
> In the current version of our structural mechanics FEA software (
> http://www.code-aster.org/), all MPI processes have knowledge of the
> whole matrix and therefore can easily pass it to PETSc without the need for
> any communication. In a nutshell, stash is empty after MatSetValues and no
> mallocs occur during assembly.
> We're now building a distributed version of the software with each process
> reading its own subdomain in order to save memory. The mesh was partitioned
> with Metis and as a first approach we built a simple partition of the
> degrees of freedom based on the gradual subdomains. This eliminates the
> need for Application Ordering but yields an unbalanced decomposition in
> terms of rows. If we take an example with 2 MPI processes : processor 0
> will have more unknowns than processor 1 and will receive entries lying on
> the interface whereas processor 1 will have all entries locally.
>
> PETSc manual states that "It is fine to generate some entries on the
> "wrong" process. Often this can lead to cleaner, simpler, less buggy codes.
> One should never make code overly complicated in order to generate all
> values locally. Rather, one should organize the code in such a way that
> most values are generated locally."
> Judging from the performance we obtain on a simple cube with two
> processes, it seems we have generated too much entries on the wrong
> process. Indeed our distributed code runs slower than the current one.
> However the stash does not seem to contain that much (650 000 over a total
> of 50 000 000 nnz). We have attached the output obtained with "-info" as
> well as the "-log_summary" profiling. Most of the time is spent in the
> assembly and a lot of mallocs occur.
>
>
>
>
> What's your advice on this ? Is working with ghost cells the only option ?
> We were wondering if we could preallocate the stash for example to
> decrease the number of mallocs.
>
> Regards,
>
> Thomas
>
>
> Ce message et toutes les pièces jointes (ci-après le 'Message') sont
> établis à l'intention exclusive des destinataires et les informations qui y
> figurent sont strictement confidentielles. Toute utilisation de ce Message
> non conforme à sa destination, toute diffusion ou toute publication totale
> ou partielle, est interdite sauf autorisation expresse.
>
> Si vous n'êtes pas le destinataire de ce Message, il vous est interdit de
> le copier, de le faire suivre, de le divulguer ou d'en utiliser tout ou
> partie. Si vous avez reçu ce Message par erreur, merci de le supprimer de
> votre système, ainsi que toutes ses copies, et de n'en garder aucune trace
> sur quelque support que ce soit. Nous vous remercions également d'en
> avertir immédiatement l'expéditeur par retour du message.
>
> Il est impossible de garantir que les communications par messagerie
> électronique arrivent en temps utile, sont sécurisées ou dénuées de toute
> erreur ou virus.
> ____________________________________________________
>
> This message and any attachments (the 'Message') are intended solely for
> the addressees. The information contained in this Message is confidential.
> Any use of information contained in this Message not in accord with its
> purpose, any dissemination or disclosure, either whole or partial, is
> prohibited except formal approval.
>
> If you are not the addressee, you may not copy, forward, disclose or use
> any part of it. If you have received this message in error, please delete
> it and all copies from your system and notify the sender immediately by
> return message.
>
> E-mail communication cannot be guaranteed to be timely secure, error or
> virus-free.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20120628/b9ead248/attachment-0001.html>