[petsc-users] Scalability issue

Thu Aug 20 06:30:56 CDT 2015

Hello.

I am sorry for the long time without response. I decided to rewrite my 
application in a different way and will send the log_summary output when 
done reimplementing.

As for the machine, I am using mpirun to run jobs in a 8 node cluster. 
I modified the makefile on the steams folder so it would run using my 
hostfile.
The output is attached to this email. It seems reasonable for a cluster 
with 8 machines. From "lscpu", each machine cpu has 4 cores and 1 
socket.

Cheers,
Nelson

Em 2015-07-24 16:50, Barry Smith escreveu:
> It would be very helpful if you ran the code on say 1, 2, 4, 8, 16
> ... processes with the option -log_summary and send (as attachments)
> the log summary information.
>
>    Also on the same machine run the streams benchmark; with recent
> releases of PETSc you only need to do
>
> cd $PETSC_DIR
> make streams NPMAX=16 (or whatever your largest process count is)
>
> and send the output.
>
> I suspect that you are doing everything fine and it is more an issue
> with the configuration of your machine. Also read the information at
> http://www.mcs.anl.gov/petsc/documentation/faq.html#computers on
> "binding"
>
>   Barry
>
>> On Jul 24, 2015, at 10:41 AM, Nelson Filipe Lopes da Silva 
>> <nelsonflsilva at ist.utl.pt> wrote:
>>
>> Hello,
>>
>> I have been using PETSc for a few months now, and it truly is 
>> fantastic piece of software.
>>
>> In my particular example I am working with a large, sparse 
>> distributed (MPI AIJ) matrix we can refer as 'G'.
>> G is a horizontal - retangular matrix (for example, 1,1 Million rows 
>> per 2,1 Million columns). This matrix is commonly very sparse and not 
>> diagonal 'heavy' (for example 5,2 Million nnz in which ~50% are on the 
>> diagonal block of MPI AIJ representation).
>> To work with this matrix, I also have a few parallel vectors 
>> (created using MatCreate Vec), we can refer as 'm' and 'k'.
>> I am trying to parallelize an iterative algorithm in which the most 
>> computational heavy operations are:
>>
>> ->Matrix-Vector Multiplication, more precisely G * m + k = b 
>> (MatMultAdd). From what I have been reading, to achive a good speedup 
>> in this operation, G should be as much diagonal as possible, due to 
>> overlapping communication and computation. But even when using a G 
>> matrix in which the diagonal block has ~95% of the nnz, I cannot get a 
>> decent speedup. Most of the times, the performance even gets worse.
>>
>> ->Matrix-Matrix Multiplication, in this case I need to perform G * 
>> G' = A, where A is later used on the linear solver and G' is transpose 
>> of G. The speedup in this operation is not worse, although is not very 
>> good.
>>
>> ->Linear problem solving. Lastly, In this operation I compute "Ax=b" 
>> from the last two operations. I tried to apply a RCM permutation to A 
>> to make it more diagonal, for better performance. However, the problem 
>> I faced was that, the permutation is performed locally in each 
>> processor and thus, the final result is different with different 
>> number of processors. I assume this was intended to reduce 
>> communication. The solution I found was
>> 1-calculate A
>> 2-calculate, localy to 1 machine, the RCM permutation IS using A
>> 3-apply this permutation to the lines of G.
>> This works well, and A is generated as if RCM permuted. It is fine 
>> to do this operation in one machine because it is only done once while 
>> reading the input. The nnz of G become more spread and less diagonal, 
>> causing problems when calculating G * m + k = b.
>>
>> These 3 operations (except the permutation) are performed in each 
>> iteration of my algorithm.
>>
>> So, my questions are.
>> -What are the characteristics of G that lead to a good speedup in 
>> the operations I described? Am I missing something and too much 
>> obsessed with the diagonal block?
>>
>> -Is there a better way to permute A without permute G and still get 
>> the same result using 1 or N machines?
>>
>>
>> I have been avoiding asking for help for a while. I'm very sorry for 
>> the long email.
>> Thank you very much for your time.
>> Best Regards,
>> Nelson
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: streams.output
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20150820/22dda83c/attachment.ksh>