[mpich-discuss] MPICH2, ONE OF THE PROCESSES TERMINATED BADLY: CLEANING UP

Darius Buntinas buntinas at mcs.anl.gov
Mon May 16 11:14:24 CDT 2011


I believe the old pgi compilers had a bug in how they handled inline assembly.  You'll need to upgrade to at least 9.0-1. See:  http://wiki.mcs.anl.gov/mpich2/index.php/Compiler_Quirks

-d

On May 14, 2011, at 10:05 PM, Wei Huang wrote:

> Anthony,
> 
> I tried run it, and then run gdb with the core.* file, result as below.
> Is this what you mean?
> 
> Thanks,
> 
> 
> Wei Huang
> 
> ----------------
> 
> 
> 
> 
> mpirun -np 2 ./cpi
> 
> Process 0 of 2 is on neem
> Process 1 of 2 is on neem
> pi is approximately 3.1415926544231318, Error is 0.0000000008333387
> wall clock time = 0.000262
> 
> =====================================================================================
> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> =   EXIT CODE: 139
> =   CLEANING UP REMAINING PROCESSES
> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
> =====================================================================================
> APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
> 
> 
> gdb ./cpi core.21103 
> GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-32.el5_6.2)
> Copyright (C) 2009 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-redhat-linux-gnu".
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>...
> Reading symbols from /neem2/huangwei/src/mpich2-1.4rc2/examples/cpi...done.
> Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
> Loaded symbols for /lib64/libm.so.6
> Reading symbols from /lib64/librt.so.1...(no debugging symbols found)...done.
> Loaded symbols for /lib64/librt.so.1
> Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done.
> Loaded symbols for /lib64/libpthread.so.0
> Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
> Loaded symbols for /lib64/libc.so.6
> Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
> Loaded symbols for /lib64/ld-linux-x86-64.so.2
> Reading symbols from /lib64/libnss_files.so.2...(no debugging symbols found)...done.
> Loaded symbols for /lib64/libnss_files.so.2
> 
> warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7fff32459000
> Core was generated by `./cpi'.
> Program terminated with signal 11, Segmentation fault.
> #0  0x000000000043369f in OPA_cas_ptr (ptr=0x2ad369f821c8, oldv=0x4200c0, newv=0x0) at ch3_progress.c:102
> 102     static int check_terminating_vcs(void)
> (gdb) where
> #0  0x000000000043369f in OPA_cas_ptr (ptr=0x2ad369f821c8, oldv=0x4200c0, newv=0x0) at ch3_progress.c:102
> #1  0x00000000004335e7 in MPID_NEM_CAS_REL_NULL (ptr=0x2ad369f821c8, oldv=...) at ch3_progress.c:67
> #2  0x0000000000433160 in MPID_nem_mpich2_blocking_recv (cell=0x7fff32409958, in_fbox=0x7fff32409940, completions=1)
>    at ch3_progress.c:931
> #3  0x000000000042fb42 in MPIDI_CH3I_Progress (progress_state=0x7fff3240998c, is_blocking=1) at ch3_progress.c:396
> #4  0x0000000000434116 in MPIDI_CH3U_VC_WaitForClose () at ch3u_handle_connection.c:376
> #5  0x0000000000447a5a in MPID_Finalize () at mpid_finalize.c:105
> #6  0x00000000004196b6 in PMPI_Finalize () at finalize.c:191
> #7  0x0000000000402e39 in main (argc=1, argv=0x7fff32409ba8) at cpi.c:62
> (gdb) 
> 
> On May 14, 2011, at 6:03 PM, Anthony Chan wrote:
> 
>> 
>> Both pgi 6.x and 8.x are old compilers, we don't have access
>> to such old pgi compilers to reproduce the segfault. Could you
>> provide a backtrace of the segfault ?
>> 
>> A.Chan
>> 
>> ----- Original Message -----
>>> Hi, there,
>>> 
>>> 
>>> I am trying install mpich2-1.4rc2 on a Linux cluster with pgi.
>>> When I use an old version (6.2.5) of PGI to build, the mpi example
>>> worked fine.
>>> (But have problem to build other software.)
>>> 
>>> But with a new version (8.0.4), after the build, when I run the
>>> example
>>> program, I got seg. fault with w processors.
>>> 
>>> Anyone knows what is wrong here?
>>> 
>>> Thanks,
>>> 
>>> Wei
>>> 
>>> ----------------
>>> 
>>> mpiexec -n 1 ./cpi
>>> 
>>> Process 0 of 1 is on neem
>>> pi is approximately 3.1415926544231341, Error is 0.0000000008333410
>>> wall clock time = 0.000275
>>> 
>>> 
>>> mpiexec -n 2 ./cpi
>>> Process 1 of 2 is on neem
>>> Process 0 of 2 is on neem
>>> pi is approximately 3.1415926544231318, Error is 0.0000000008333387
>>> wall clock time = 0.000192
>>> 
>>> =====================================================================================
>>> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>>> = EXIT CODE: 11
>>> = CLEANING UP REMAINING PROCESSES
>>> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>>> =====================================================================================
>>> APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault
>>> (signal 11)
>>> 
>>> 
>>> _______________________________________________
>>> mpich-discuss mailing list
>>> mpich-discuss at mcs.anl.gov
>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> 
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss



More information about the mpich-discuss mailing list