[petsc-users] Communication during MatAssemblyEnd

Jed Brown jed at jedbrown.org
Fri Jun 21 10:56:20 CDT 2019


What is the partition like?  Suppose you randomly assigned nodes to
processes; then in the typical case, all neighbors would be on different
processors.  Then the "diagonal block" would be nearly diagonal and the
off-diagonal block would be huge, requiring communication with many
other processes.

"Smith, Barry F. via petsc-users" <petsc-users at mcs.anl.gov> writes:

>    The load balance is definitely out of whack. 
>
>
>
> BuildTwoSidedF         1 1.0 1.6722e-0241.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatMult              138 1.0 2.6604e+02 7.4 3.19e+10 2.1 8.2e+07 7.8e+06 0.0e+00  2  4 13 13  0  15 25100100  0 2935476
> MatAssemblyBegin       1 1.0 1.6807e-0236.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatAssemblyEnd         1 1.0 3.5680e-01 3.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecNorm                2 1.0 4.4252e+0174.8 1.73e+07 1.0 0.0e+00 0.0e+00 2.0e+00  1  0  0  0  0   5  0  0  0  1 12780
> VecCopy                6 1.0 6.5655e-02 2.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecAXPY                2 1.0 1.3793e-02 2.7 1.73e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 41000838
> VecScatterBegin      138 1.0 1.1653e+0285.8 0.00e+00 0.0 8.2e+07 7.8e+06 0.0e+00  1  0 13 13  0   4  0100100  0     0
> VecScatterEnd        138 1.0 1.3653e+0222.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   4  0  0  0  0     0
> VecSetRandom           1 1.0 9.6668e-01 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>
> Note that VecCopy/AXPY/SetRandom which are all embarrassingly parallel have a balance ratio above 2 which means some processes have more than twice the work of others. Meanwhile the ratio for anything with communication is extremely in balanced, some processes get to the synchronization point well before other processes. 
>
> The first thing I would do is worry about the load imbalance, what is its cause? is it one process with much less work than others (not great but not terrible) or is it one process with much more work then the others (terrible) or something in between. I think once you get a handle on the load balance the rest may fall into place, otherwise we still have some exploring to do. This is not expected behavior for a good machine with a good network and a well balanced job. After you understand the load balancing you may need to use one of the parallel performance visualization tools to see why the synchronization is out of whack.
>
>    Good luck
>
>   Barry
>
>
>> On Jun 21, 2019, at 9:27 AM, Ale Foggia <amfoggia at gmail.com> wrote:
>> 
>> I'm sending one with a bit less time.
>> I'm timing the functions also with std::chronos and for the case of 180 seconds the program runs out of memory (and crushes) before the PETSc log gets to be printed, so I know the time only from my function. Anyway, in every case, the times between std::chronos and the PETSc log match.
>> 
>> (The large times are in part "4b- Building offdiagonal part" or "Event Stage 5: Offdiag").
>> 
>> El vie., 21 jun. 2019 a las 16:09, Zhang, Junchao (<jczhang at mcs.anl.gov>) escribió:
>> 
>> 
>> On Fri, Jun 21, 2019 at 8:07 AM Ale Foggia <amfoggia at gmail.com> wrote:
>> Thanks both of you for your answers,
>> 
>> El jue., 20 jun. 2019 a las 22:20, Smith, Barry F. (<bsmith at mcs.anl.gov>) escribió:
>> 
>>   Note that this is a one time cost if the nonzero structure of the matrix stays the same. It will not happen in future MatAssemblies.
>> 
>> > On Jun 20, 2019, at 3:16 PM, Zhang, Junchao via petsc-users <petsc-users at mcs.anl.gov> wrote:
>> > 
>> > Those messages were used to build MatMult communication pattern for the matrix. They were not part of the matrix entries-passing you imagined, but indeed happened in MatAssemblyEnd. If you want to make sure processors do not set remote entries, you can use MatSetOption(A,MAT_NO_OFF_PROC_ENTRIES,PETSC_TRUE), which will generate an error when an off-proc entry is set.
>> 
>> I started being concerned about this when I saw that the assembly was taking a few hundreds of seconds in my code, like 180 seconds, which for me it's a considerable time. Do you think (or maybe you need more information to answer this) that this time is "reasonable" for communicating the pattern for the matrix? I already checked that I'm not setting any remote entries. 
>> It is not reasonable. Could you send log view of that test with 180 seconds MatAssembly?
>>  
>> Also I see (in my code) that even if there are no messages being passed during MatAssemblyBegin, it is taking time and the "ratio" is very big.
>> 
>> > 
>> > 
>> > --Junchao Zhang
>> > 
>> > 
>> > On Thu, Jun 20, 2019 at 4:13 AM Ale Foggia via petsc-users <petsc-users at mcs.anl.gov> wrote:
>> > Hello all!
>> > 
>> > During the conference I showed you a problem happening during MatAssemblyEnd in a particular code that I have. Now, I tried the same with a simple code (a symmetric problem corresponding to the Laplacian operator in 1D, from the SLEPc Hands-On exercises). As I understand (and please, correct me if I'm wrong), in this case the elements of the matrix are computed locally by each process so there should not be any communication during the assembly. However, in the log I get that there are messages being passed. Also, the number of messages changes with the number of processes used and the size of the matrix. Could you please help me understand this?
>> > 
>> > I attach the code I used and the log I get for a small problem.
>> > 
>> > Cheers,
>> > Ale
>> > 
>> 
>> <log.txt>


More information about the petsc-users mailing list