Hi Hong,<br><br>Thanks for your reply. I check the attachment sent last time and I miss some very information. I attached the complete info with this email. Sorry for this.<br><br>I also tried other matrix ordering like mat_mumps_icntl_7 2 and I got the similar performance. I checked the configuration log file on that cluster, they use --download-f-blas-lapack=1 instead of using optimal BLAS. Would this be the problem which cause the poor performance? Actually the mumps runs quite slow, at14 Gflop/s, which is far from the machine's peak. <br>
<br>Thanks.<br>Wen<br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
Wen :<br>
<br>
> Reply to<br>
><br>
>> This is weird. Try<br>
>> 1) increase work space with<br>
>> -mat_mumps_icntl_14 50 (default is 20)<br>
>> 2) different matrix orderings with<br>
>> -mat_mumps_icntl_7 2 (or number from 0 to 6)<br>
>><br>
>> Run your code with '-log_summary' and see which routine causes this huge<br>
>> difference.<br>
>><br>
> Why your '-log_summary' only gives<br>
KSPSolve 4 1.0 2.2645e+03 1.0 0.00e+00 0.0 3.9e+04 3.6e+02<br>
5.4e+01 96 0 27 0 9 96 0 27 0 9 0<br>
PCSetUp 4 1.0 2.2633e+03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00<br>
3.4e+01 96 0 0 0 6 96 0 0 0 6 0<br>
PCApply 4 1.0 1.1641e+00 1.0 0.00e+00 0.0 3.9e+04 3.6e+02<br>
2.0e+01 0 0 27 0 3 0 0 27 0 3 0<br>
<br>
I get<br>
petsc-dev/src/ksp/ksp/examples/tutorials>mpiexec -n 2 ./ex2 -pc_type lu<br>
-pc_factor_mat_solver_package mumps -log_summary<br>
MatMult 2 1.0 1.6904e-04 1.0 4.44e+02 1.0 4.0e+00 5.6e+01<br>
0.0e+00 0 47 25 13 0 0 47 33 13 0 5<br>
MatSolve 2 1.0 3.8259e-03 1.0 0.00e+00 0.0 8.0e+00 1.9e+02<br>
6.0e+00 10 0 50 84 7 11 0 67 87 9 0<br>
MatLUFactorSym 1 1.0 2.9058e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00<br>
8.0e+00 7 0 0 0 9 8 0 0 0 11 0<br>
MatLUFactorNum 1 1.0 2.0120e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00<br>
2.0e+00 5 0 0 0 2 6 0 0 0 3 0<br>
...<br>
<br>
I like to check these functions. In addition, have you tried other matrix<br>
orderings?<br>
Hong<br>
<br>
><br>
>> Hong<br>
>><br>
>><br>
> I just tested the problem according to what you suggested. I set icntl_14<br>
> = 50 and icntl_7 = 5 (METIS). The problem still persisted. The first solve<br>
> took 920 second and second solve took 215 second with same nonzero pattern<br>
> pc set up. I also attached the log_summary output file. Do you have any<br>
> further suggestion? Thanks.<br>
><br>
> Regards,<br>
> Wen<br>
><br>
><br>
<br>
</blockquote><br></div><br>