[petsc-users] Scalability issue
Nelson Filipe Lopes da Silva
nelsonflsilva at ist.utl.pt
Thu Aug 20 06:30:56 CDT 2015
Hello.
I am sorry for the long time without response. I decided to rewrite my
application in a different way and will send the log_summary output when
done reimplementing.
As for the machine, I am using mpirun to run jobs in a 8 node cluster.
I modified the makefile on the steams folder so it would run using my
hostfile.
The output is attached to this email. It seems reasonable for a cluster
with 8 machines. From "lscpu", each machine cpu has 4 cores and 1
socket.
Cheers,
Nelson
Em 2015-07-24 16:50, Barry Smith escreveu:
> It would be very helpful if you ran the code on say 1, 2, 4, 8, 16
> ... processes with the option -log_summary and send (as attachments)
> the log summary information.
>
> Also on the same machine run the streams benchmark; with recent
> releases of PETSc you only need to do
>
> cd $PETSC_DIR
> make streams NPMAX=16 (or whatever your largest process count is)
>
> and send the output.
>
> I suspect that you are doing everything fine and it is more an issue
> with the configuration of your machine. Also read the information at
> http://www.mcs.anl.gov/petsc/documentation/faq.html#computers on
> "binding"
>
> Barry
>
>> On Jul 24, 2015, at 10:41 AM, Nelson Filipe Lopes da Silva
>> <nelsonflsilva at ist.utl.pt> wrote:
>>
>> Hello,
>>
>> I have been using PETSc for a few months now, and it truly is
>> fantastic piece of software.
>>
>> In my particular example I am working with a large, sparse
>> distributed (MPI AIJ) matrix we can refer as 'G'.
>> G is a horizontal - retangular matrix (for example, 1,1 Million rows
>> per 2,1 Million columns). This matrix is commonly very sparse and not
>> diagonal 'heavy' (for example 5,2 Million nnz in which ~50% are on the
>> diagonal block of MPI AIJ representation).
>> To work with this matrix, I also have a few parallel vectors
>> (created using MatCreate Vec), we can refer as 'm' and 'k'.
>> I am trying to parallelize an iterative algorithm in which the most
>> computational heavy operations are:
>>
>> ->Matrix-Vector Multiplication, more precisely G * m + k = b
>> (MatMultAdd). From what I have been reading, to achive a good speedup
>> in this operation, G should be as much diagonal as possible, due to
>> overlapping communication and computation. But even when using a G
>> matrix in which the diagonal block has ~95% of the nnz, I cannot get a
>> decent speedup. Most of the times, the performance even gets worse.
>>
>> ->Matrix-Matrix Multiplication, in this case I need to perform G *
>> G' = A, where A is later used on the linear solver and G' is transpose
>> of G. The speedup in this operation is not worse, although is not very
>> good.
>>
>> ->Linear problem solving. Lastly, In this operation I compute "Ax=b"
>> from the last two operations. I tried to apply a RCM permutation to A
>> to make it more diagonal, for better performance. However, the problem
>> I faced was that, the permutation is performed locally in each
>> processor and thus, the final result is different with different
>> number of processors. I assume this was intended to reduce
>> communication. The solution I found was
>> 1-calculate A
>> 2-calculate, localy to 1 machine, the RCM permutation IS using A
>> 3-apply this permutation to the lines of G.
>> This works well, and A is generated as if RCM permuted. It is fine
>> to do this operation in one machine because it is only done once while
>> reading the input. The nnz of G become more spread and less diagonal,
>> causing problems when calculating G * m + k = b.
>>
>> These 3 operations (except the permutation) are performed in each
>> iteration of my algorithm.
>>
>> So, my questions are.
>> -What are the characteristics of G that lead to a good speedup in
>> the operations I described? Am I missing something and too much
>> obsessed with the diagonal block?
>>
>> -Is there a better way to permute A without permute G and still get
>> the same result using 1 or N machines?
>>
>>
>> I have been avoiding asking for help for a while. I'm very sorry for
>> the long email.
>> Thank you very much for your time.
>> Best Regards,
>> Nelson
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: streams.output
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20150820/22dda83c/attachment.ksh>
More information about the petsc-users
mailing list