The factorizations seem to be going through. It seem to take 40 mins or so per factorization. <div><br></div><div> -Nachiket<br><br><div class="gmail_quote">On Thu, Dec 13, 2012 at 5:19 PM, Matthew Knepley <span dir="ltr"><<a href="mailto:knepley@gmail.com" target="_blank">knepley@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">On Thu, Dec 13, 2012 at 1:44 PM, Nachiket Gokhale <<a href="mailto:gokhalen@gmail.com">gokhalen@gmail.com</a>> wrote:<br>
> Thanks - should I attached the debugger in debug mode or in optimized mode?<br>
> I suspect it will be tremendously slow in debug mode, otoh I am not sure if<br>
> it will yield any useful information in optimized mode.<br>
<br>
</div>Optimized will still give a stack trace.<br>
<div class="im"><br>
> Also, will -on_error_attach_debugger do the trick?<br>
<br>
</div>No, either spawn one -start_in_debugger -debugger_nodes 0, or attach<br>
using gdb -p <proc id><br>
<br>
Matt<br>
<div class="HOEnZb"><div class="h5"><br>
> -Nachiket<br>
><br>
> On Thu, Dec 13, 2012 at 4:29 PM, Matthew Knepley <<a href="mailto:knepley@gmail.com">knepley@gmail.com</a>> wrote:<br>
>><br>
>> On Thu, Dec 13, 2012 at 1:20 PM, Nachiket Gokhale <<a href="mailto:gokhalen@gmail.com">gokhalen@gmail.com</a>><br>
>> wrote:<br>
>> > I am trying to solve a complex matrix equation which was assembled using<br>
>> > MatCompositeMerge using MUMPS and LU preconditioner. It seems to me<br>
>> > that<br>
>> > the solve is stuck in the factorization phase. It is taking 20 mins or<br>
>> > so,<br>
>> > using 16 processes. A problem of the same size using reals instead of<br>
>> > complex was solved previously in approximately a minute using 4<br>
>> > processes.<br>
>> > Mumps output of -mat_mumps_icntl_4 1 at the end of this email. Does<br>
>> > anyone<br>
>> > have any ideas about what the problem maybe ?<br>
>><br>
>> Complex arithmetic is much more expensive, and you can lose some of<br>
>> the optimizations<br>
>> made in the code. I think you have to wait longer than this. Also, you<br>
>> should try attaching<br>
>> the debugger to a process to see whether it is computing or waiting.<br>
>><br>
>> Matt<br>
>><br>
>> > Thanks,<br>
>> ><br>
>> > -Nachiket<br>
>> ><br>
>> ><br>
>> ><br>
>> > Entering ZMUMPS driver with JOB, N, NZ = 1 122370 0<br>
>> ><br>
>> > ZMUMPS 4.10.0<br>
>> > L U Solver for unsymmetric matrices<br>
>> > Type of parallelism: Working host<br>
>> ><br>
>> > ****** ANALYSIS STEP ********<br>
>> ><br>
>> > ** Max-trans not allowed because matrix is distributed<br>
>> > ... Structural symmetry (in percent)= 100<br>
>> > Density: NBdense, Average, Median = 0 42 26<br>
>> > Ordering based on METIS<br>
>> > A root of estimated size 2736 has been selected for Scalapack.<br>
>> ><br>
>> > Leaving analysis phase with ...<br>
>> > INFOG(1) = 0<br>
>> > INFOG(2) = 0<br>
>> > -- (20) Number of entries in factors (estim.) = 563723522<br>
>> > -- (3) Storage of factors (REAL, estimated) = 565185337<br>
>> > -- (4) Storage of factors (INT , estimated) = 3537003<br>
>> > -- (5) Maximum frontal size (estimated) = 15239<br>
>> > -- (6) Number of nodes in the tree = 7914<br>
>> > -- (32) Type of analysis effectively used = 1<br>
>> > -- (7) Ordering option effectively used = 5<br>
>> > ICNTL(6) Maximum transversal option = 0<br>
>> > ICNTL(7) Pivot order option = 7<br>
>> > Percentage of memory relaxation (effective) = 35<br>
>> > Number of level 2 nodes = 35<br>
>> > Number of split nodes = 8<br>
>> > RINFOG(1) Operations during elimination (estim)= 4.877D+12<br>
>> > Distributed matrix entry format (ICNTL(18)) = 3<br>
>> > ** Rank of proc needing largest memory in IC facto : 0<br>
>> > ** Estimated corresponding MBYTES for IC facto : 3661<br>
>> > ** Estimated avg. MBYTES per work. proc at facto (IC) : 2018<br>
>> > ** TOTAL space in MBYTES for IC factorization : 32289<br>
>> > ** Rank of proc needing largest memory for OOC facto : 0<br>
>> > ** Estimated corresponding MBYTES for OOC facto : 3462<br>
>> > ** Estimated avg. MBYTES per work. proc at facto (OOC) : 1787<br>
>> > ** TOTAL space in MBYTES for OOC factorization : 28599<br>
>> > Entering ZMUMPS driver with JOB, N, NZ = 2 122370 5211070<br>
>> ><br>
>> > ****** FACTORIZATION STEP ********<br>
>> ><br>
>> ><br>
>> > GLOBAL STATISTICS PRIOR NUMERICAL FACTORIZATION ...<br>
>> > NUMBER OF WORKING PROCESSES = 16<br>
>> > OUT-OF-CORE OPTION (ICNTL(22)) = 0<br>
>> > REAL SPACE FOR FACTORS = 565185337<br>
>> > INTEGER SPACE FOR FACTORS = 3537003<br>
>> > MAXIMUM FRONTAL SIZE (ESTIMATED) = 15239<br>
>> > NUMBER OF NODES IN THE TREE = 7914<br>
>> > Convergence error after scaling for ONE-NORM (option 7/8) = 0.79D+00<br>
>> > Maximum effective relaxed size of S = 199523439<br>
>> > Average effective relaxed size of S = 98303057<br>
>> ><br>
>> > REDISTRIB: TOTAL DATA LOCAL/SENT = 657185 14022665<br>
>> > GLOBAL TIME FOR MATRIX DISTRIBUTION = 0.4805<br>
>> > ** Memory relaxation parameter ( ICNTL(14) ) : 35<br>
>> > ** Rank of processor needing largest memory in facto : 0<br>
>> > ** Space in MBYTES used by this processor for facto : 3661<br>
>> > ** Avg. Space in MBYTES per working proc during facto : 2018<br>
>> ><br>
>><br>
>><br>
>><br>
>> --<br>
>> What most experimenters take for granted before they begin their<br>
>> experiments is infinitely more interesting than any results to which<br>
>> their experiments lead.<br>
>> -- Norbert Wiener<br>
><br>
><br>
<br>
<br>
<br>
--<br>
What most experimenters take for granted before they begin their<br>
experiments is infinitely more interesting than any results to which<br>
their experiments lead.<br>
-- Norbert Wiener<br>
</div></div></blockquote></div><br></div>