[petsc-users] Solving A*X = B where A and B are matrices

Matthew Knepley knepley at gmail.com
Mon Dec 3 13:20:54 CST 2012


On Mon, Dec 3, 2012 at 1:08 PM, Jelena Slivka <slivkaje at gmail.com> wrote:
> Thank you very much!
> However, I have another question. I have a cluster of 4 nodes and each node
> has 6 cores. If I run my code using 6 cores on one node (using the command
> "mpiexec -n 6") it is much faster than running it on just one process (which
> is expected). However, if I try running the code on multiple nodes (using
> "mpiexec -f machinefile -ppn 4", where machinefile is the file which
> contains the node names), it runs much slower than on just one process. This
> also happens with tutorial examples. I have checked the number of iteration
> for KSP solver when spread on multiple processors and it doesn't seem to be
> the problem. Do you have any suggestions on what am I doing wrong? Are the
> commands I am using wrong?

Most operations are memory bandwidth limited, and it sounds like the memory
bandwidth for your cluster is maxed out by 1-2 procs.

   Matt

> On Sat, Dec 1, 2012 at 6:03 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>
>>
>>     We recommend following the directions
>> http://www.mcs.anl.gov/petsc/documentation/faq.html#schurcomplement  for
>> computing a Schur complement; just skip the unneeded step. MUMPS supports a
>> parallel Cholesky but you can also use a parallel LU with MUMPS, PaSTIX or
>> SuperLU_Dist and those will work fine also. With current software Cholesky
>> in parallel is not tons better than LU so generally not worth monkeying
>> with.
>>
>>    Barry
>>
>>
>> On Dec 1, 2012, at 12:05 PM, Jelena Slivka <slivkaje at gmail.com> wrote:
>>
>> > Hello!
>> > I am trying to solve A*X = B where A and B are matrices, and then find
>> > trace of the resulting matrix X. My approach has been to partition matrix B
>> > in column vectors bi and then solve each system A*xi = bi. Then, for all
>> > vectors xi I would extract i-th element xi(i) and sum those elements in
>> > order to get Trace(X).
>> > Pseudo-code:
>> > 1) load matrices A and B
>> > 2) transpose matrix B (so that each right-hand side bi is in the row, as
>> > operation MatGetColumnVector is slow)
>> > 3) set up KSPSolve
>> > 4) create vector diagonal (in which xi(i) elements will be stored)
>> > 5) for each row i of matrix B owned by current process:
>> >           - create vector bi by extracting row i from matrix B
>> >           - apply KSPsolve to get xi
>> >           - insert value xi(i) in diagonal vector (only the process
>> > which
>> >             holds the ith value of vector x(i) should do so)
>> > 6) sum vector diagonal to get the trace.
>> > However, my code (attached, along with the test case) runs fine on one
>> > process, but hangs if started on multiple processes. Could you please help
>> > me figure out what am I doing wrong?
>> > Also, could you please tell me is it possible to use Cholesky
>> > factorization when running on multiple processes (I see that I cannot use it
>> > when I set the format of matrix A to MPIAIJ)?
>> >
>> > <Experiment.c><Abin><Bbin>
>>
>



--
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener


More information about the petsc-users mailing list