On Wed, Dec 14, 2011 at 4:07 PM, Xiangdong Liang <span dir="ltr"><<a href="mailto:xdliang@gmail.com">xdliang@gmail.com</a>></span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
I use MatNorm and VecNorm on A and b before I call kspsolve, and both<br>
of them are finite. Then I use fp_trap to catch where the nan comes<br>
from. However, it traces down to unlikely place pthread_join. The gdb<br>
where information is given below. Can you give me some help? Thanks.<br></blockquote><div><br></div><div>Great. This is very helpful. It seems quite clear that this is a Pastix problem. I</div><div>would submit this is to the Pastix development list.</div>
<div><br></div><div> Thanks,</div><div><br></div><div> Matt</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
0x00007faeb6d8cbe5 in pthread_join () from /lib64/libpthread.so.0<br>
#1 0x00007faeb8ab7d6b in sopalin_launch_thread (procnum=18, procnbr=24,<br>
ptr=0xffffffff, calc_thrdnbr=1,<br>
calc_routine=0x7faeb8a7c9f3 <Z_Ugmres_smp>, calc_data=0xfe78f0,<br>
comm_thrdnbr=0, comm_routine=0x7faeb8a82e4c <Z_Usopalin_updo_comm>,<br>
comm_data=0xfe78f0, ooc_thrdnbr=0, ooc_routine=0, ooc_data=0xfe78f0)<br>
at sopalin/src/sopalin_thread.c:235<br>
#2 0x00007faeb8a81e52 in Z_Ugmres_thread (datacode=0x11c8390,<br>
sopaparam=0x11c8520) at sopalin/src/raff.c:1174<br>
#3 0x00007faeb8a53993 in Z_pastix_task_raff (pastix_data=0x11c8390,<br>
pastix_comm=0x1e02690, n=210000, b=0x2528ec0, rhsnbr=1, loc2glob=0x0)<br>
at sopalin/src/pastix.c:3581<br>
#4 0x00007faeb8a54a0e in z_pastix (pastix_data=0x20defc0,<br>
pastix_comm=0x1e02690, n=210000, colptr=0x245b610, row=0x7faea68b7760,<br>
avals=0x7faea4005760, perm=0x238dd60, invp=0x20e1080, b=0x2528ec0, rhs=1,<br>
iparm=0x20df004, dparm=0x20df108) at sopalin/src/pastix.c:4262<br>
#5 0x00007faeb83b7fd5 in MatSolve_PaStiX (A=0x203e280, b=0x12140d0,<br>
x=0x1f1b430)<br>
at /home/xdliang/MyLocal/petsc-dev/src/mat/impls/aij/mpi/pastix/pastix.c:328<br>
#6 0x00007faeb7b51d7c in MatSolve (mat=0x203e280, b=0x12140d0, x=0x1f1b430)<br>
at /home/xdliang/MyLocal/petsc-dev/src/mat/interface/matrix.c:3106<br>
#7 0x00007faeb8540f0e in PCApply_LU (pc=0x1de4d20, x=0x12140d0, y=0x1f1b430)<br>
---Type <return> to continue, or q <return> to quit---<br>
at /home/xdliang/MyLocal/petsc-dev/src/ksp/pc/impls/factor/lu/lu.c:204<br>
#8 0x00007faeb85df50b in PCApply (pc=0x1de4d20, x=0x12140d0, y=0x1f1b430)<br>
at /home/xdliang/MyLocal/petsc-dev/src/ksp/pc/interface/precon.c:383<br>
#9 0x00007faeb863c660 in KSPSolve_PREONLY (ksp=0x1e37590)<br>
at /home/xdliang/MyLocal/petsc-dev/src/ksp/ksp/impls/preonly/preonly.c:26<br>
#10 0x00007faeb86707fe in KSPSolve (ksp=0x1e37590, b=0x12140d0, x=0x1f1b430)<br>
at /home/xdliang/MyLocal/petsc-dev/src/ksp/ksp/interface/itfunc.c:429<br>
#11 0x000000000040a213 in EigenSolver_cmplx (data=0x7fffba7529b0, Linear=1,<br>
Eig=0, maxeigit=10) at EigenSolver_cmplx.c:66<br>
#12 0x0000000000408ddf in main (argc=77, argv=0x7fffba754df8)<br>
at mldos_cmplx.c:322<br>
<br>
Here is the error information:<br>
<br>
[18]PETSC ERROR: *** unknown floating point error occurred ***<br>
[18]PETSC ERROR: The specific exception can be determined by running<br>
in a debugger. When the<br>
[18]PETSC ERROR: debugger traps the signal, the exception can be found<br>
with fetestexcept(0x3d)<br>
[18]PETSC ERROR: where the result is a bitwise OR of the following flags:<br>
[18]PETSC ERROR: FE_INVALID=0x1 FE_DIVBYZERO=0x4 FE_OVERFLOW=0x8<br>
FE_UNDERFLOW=0x10 FE_INEXACT=0x20<br>
[18]PETSC ERROR: Try option -start_in_debugger<br>
[18]PETSC ERROR: likely location of problem given in stack below<br>
[18]PETSC ERROR: --------------------- Stack Frames<br>
------------------------------------<br>
[18]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,<br>
[18]PETSC ERROR: INSTEAD the line number of the start of the function<br>
[18]PETSC ERROR: is given.<br>
[18]PETSC ERROR: [18] PetscDefaultFPTrap line 342<br>
/home/xdliang/MyLocal/petsc-dev/src/sys/error/fp.c<br>
[18]PETSC ERROR: [18] MatSolve_PaStiX line 303<br>
/home/xdliang/MyLocal/petsc-dev/src/mat/impls/aij/mpi/pastix/pastix.c<br>
[18]PETSC ERROR: [18] MatSolve line 3089<br>
/home/xdliang/MyLocal/petsc-dev/src/mat/interface/matrix.c<br>
[18]PETSC ERROR: [18] PCApply_LU line 202<br>
/home/xdliang/MyLocal/petsc-dev/src/ksp/pc/impls/factor/lu/lu.c<br>
[18]PETSC ERROR: [18] PCApply line 373<br>
/home/xdliang/MyLocal/petsc-dev/src/ksp/pc/interface/precon.c<br>
[18]PETSC ERROR: [18] KSPSolve_PREONLY line 19<br>
/home/xdliang/MyLocal/petsc-dev/src/ksp/ksp/impls/preonly/preonly.c<br>
[18]PETSC ERROR: [18] KSPSolve line 334<br>
/home/xdliang/MyLocal/petsc-dev/src/ksp/ksp/interface/itfunc.c<br>
[18]PETSC ERROR: User provided function() line 0 in Unknown<br>
directoryUnknown file trapped floating point error<br>
<br>
<br>
<br>
<br>
<br>
On Tue, Dec 13, 2011 at 11:59 PM, Matthew Knepley <<a href="mailto:knepley@gmail.com">knepley@gmail.com</a>> wrote:<br>
> On Tue, Dec 13, 2011 at 9:49 PM, Xiangdong Liang <<a href="mailto:xdliang@gmail.com">xdliang@gmail.com</a>> wrote:<br>
>><br>
>> Hello everyone,<br>
>><br>
>> I am solving complex Ax=b with PaStix on 20 processors successfully<br>
>> but failed on 24 processors. The relatively error indicated by<br>
>> mat_pastix_verbose becomes "nan" for 24 processors. Where could be<br>
>> wrong? Can someone give me some hints on how I can debug? Thanks.<br>
><br>
><br>
> First, make sure you did not put any NaNs in your matrix or rhs.<br>
><br>
> Matt<br>
><br>
>><br>
>><br>
>> Xiangdong<br>
<span class="HOEnZb"><font color="#888888">><br>
> --<br>
> What most experimenters take for granted before they begin their experiments<br>
> is infinitely more interesting than any results to which their experiments<br>
> lead.<br>
> -- Norbert Wiener<br>
</font></span></blockquote></div><br><br clear="all"><div><br></div>-- <br>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>
-- Norbert Wiener<br>