[mpich-discuss] MPICH2, ONE OF THE PROCESSES TERMINATED BADLY: CLEANING UP

Wei Huang huangwei at ucar.edu
Sat May 14 22:05:50 CDT 2011


Anthony,

I tried run it, and then run gdb with the core.* file, result as below.
Is this what you mean?

Thanks,


Wei Huang

----------------




mpirun -np 2 ./cpi

Process 0 of 2 is on neem
Process 1 of 2 is on neem
pi is approximately 3.1415926544231318, Error is 0.0000000008333387
wall clock time = 0.000262

=====================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   EXIT CODE: 139
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
=====================================================================================
APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)


gdb ./cpi core.21103 
GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-32.el5_6.2)
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /neem2/huangwei/src/mpich2-1.4rc2/examples/cpi...done.
Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libm.so.6
Reading symbols from /lib64/librt.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/librt.so.1
Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done.
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /lib64/libnss_files.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libnss_files.so.2

warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7fff32459000
Core was generated by `./cpi'.
Program terminated with signal 11, Segmentation fault.
#0  0x000000000043369f in OPA_cas_ptr (ptr=0x2ad369f821c8, oldv=0x4200c0, newv=0x0) at ch3_progress.c:102
102     static int check_terminating_vcs(void)
(gdb) where
#0  0x000000000043369f in OPA_cas_ptr (ptr=0x2ad369f821c8, oldv=0x4200c0, newv=0x0) at ch3_progress.c:102
#1  0x00000000004335e7 in MPID_NEM_CAS_REL_NULL (ptr=0x2ad369f821c8, oldv=...) at ch3_progress.c:67
#2  0x0000000000433160 in MPID_nem_mpich2_blocking_recv (cell=0x7fff32409958, in_fbox=0x7fff32409940, completions=1)
    at ch3_progress.c:931
#3  0x000000000042fb42 in MPIDI_CH3I_Progress (progress_state=0x7fff3240998c, is_blocking=1) at ch3_progress.c:396
#4  0x0000000000434116 in MPIDI_CH3U_VC_WaitForClose () at ch3u_handle_connection.c:376
#5  0x0000000000447a5a in MPID_Finalize () at mpid_finalize.c:105
#6  0x00000000004196b6 in PMPI_Finalize () at finalize.c:191
#7  0x0000000000402e39 in main (argc=1, argv=0x7fff32409ba8) at cpi.c:62
(gdb) 

On May 14, 2011, at 6:03 PM, Anthony Chan wrote:

> 
> Both pgi 6.x and 8.x are old compilers, we don't have access
> to such old pgi compilers to reproduce the segfault. Could you
> provide a backtrace of the segfault ?
> 
> A.Chan
> 
> ----- Original Message -----
>> Hi, there,
>> 
>> 
>> I am trying install mpich2-1.4rc2 on a Linux cluster with pgi.
>> When I use an old version (6.2.5) of PGI to build, the mpi example
>> worked fine.
>> (But have problem to build other software.)
>> 
>> But with a new version (8.0.4), after the build, when I run the
>> example
>> program, I got seg. fault with w processors.
>> 
>> Anyone knows what is wrong here?
>> 
>> Thanks,
>> 
>> Wei
>> 
>> ----------------
>> 
>> mpiexec -n 1 ./cpi
>> 
>> Process 0 of 1 is on neem
>> pi is approximately 3.1415926544231341, Error is 0.0000000008333410
>> wall clock time = 0.000275
>> 
>> 
>> mpiexec -n 2 ./cpi
>> Process 1 of 2 is on neem
>> Process 0 of 2 is on neem
>> pi is approximately 3.1415926544231318, Error is 0.0000000008333387
>> wall clock time = 0.000192
>> 
>> =====================================================================================
>> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>> = EXIT CODE: 11
>> = CLEANING UP REMAINING PROCESSES
>> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>> =====================================================================================
>> APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault
>> (signal 11)
>> 
>> 
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss



More information about the mpich-discuss mailing list