[mpich-discuss] MPICH2, ONE OF THE PROCESSES TERMINATED BADLY: CLEANING UP
Darius Buntinas
buntinas at mcs.anl.gov
Mon May 16 11:14:24 CDT 2011
I believe the old pgi compilers had a bug in how they handled inline assembly. You'll need to upgrade to at least 9.0-1. See: http://wiki.mcs.anl.gov/mpich2/index.php/Compiler_Quirks
-d
On May 14, 2011, at 10:05 PM, Wei Huang wrote:
> Anthony,
>
> I tried run it, and then run gdb with the core.* file, result as below.
> Is this what you mean?
>
> Thanks,
>
>
> Wei Huang
>
> ----------------
>
>
>
>
> mpirun -np 2 ./cpi
>
> Process 0 of 2 is on neem
> Process 1 of 2 is on neem
> pi is approximately 3.1415926544231318, Error is 0.0000000008333387
> wall clock time = 0.000262
>
> =====================================================================================
> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> = EXIT CODE: 139
> = CLEANING UP REMAINING PROCESSES
> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
> =====================================================================================
> APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
>
>
> gdb ./cpi core.21103
> GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-32.el5_6.2)
> Copyright (C) 2009 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law. Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-redhat-linux-gnu".
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>...
> Reading symbols from /neem2/huangwei/src/mpich2-1.4rc2/examples/cpi...done.
> Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
> Loaded symbols for /lib64/libm.so.6
> Reading symbols from /lib64/librt.so.1...(no debugging symbols found)...done.
> Loaded symbols for /lib64/librt.so.1
> Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done.
> Loaded symbols for /lib64/libpthread.so.0
> Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
> Loaded symbols for /lib64/libc.so.6
> Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
> Loaded symbols for /lib64/ld-linux-x86-64.so.2
> Reading symbols from /lib64/libnss_files.so.2...(no debugging symbols found)...done.
> Loaded symbols for /lib64/libnss_files.so.2
>
> warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7fff32459000
> Core was generated by `./cpi'.
> Program terminated with signal 11, Segmentation fault.
> #0 0x000000000043369f in OPA_cas_ptr (ptr=0x2ad369f821c8, oldv=0x4200c0, newv=0x0) at ch3_progress.c:102
> 102 static int check_terminating_vcs(void)
> (gdb) where
> #0 0x000000000043369f in OPA_cas_ptr (ptr=0x2ad369f821c8, oldv=0x4200c0, newv=0x0) at ch3_progress.c:102
> #1 0x00000000004335e7 in MPID_NEM_CAS_REL_NULL (ptr=0x2ad369f821c8, oldv=...) at ch3_progress.c:67
> #2 0x0000000000433160 in MPID_nem_mpich2_blocking_recv (cell=0x7fff32409958, in_fbox=0x7fff32409940, completions=1)
> at ch3_progress.c:931
> #3 0x000000000042fb42 in MPIDI_CH3I_Progress (progress_state=0x7fff3240998c, is_blocking=1) at ch3_progress.c:396
> #4 0x0000000000434116 in MPIDI_CH3U_VC_WaitForClose () at ch3u_handle_connection.c:376
> #5 0x0000000000447a5a in MPID_Finalize () at mpid_finalize.c:105
> #6 0x00000000004196b6 in PMPI_Finalize () at finalize.c:191
> #7 0x0000000000402e39 in main (argc=1, argv=0x7fff32409ba8) at cpi.c:62
> (gdb)
>
> On May 14, 2011, at 6:03 PM, Anthony Chan wrote:
>
>>
>> Both pgi 6.x and 8.x are old compilers, we don't have access
>> to such old pgi compilers to reproduce the segfault. Could you
>> provide a backtrace of the segfault ?
>>
>> A.Chan
>>
>> ----- Original Message -----
>>> Hi, there,
>>>
>>>
>>> I am trying install mpich2-1.4rc2 on a Linux cluster with pgi.
>>> When I use an old version (6.2.5) of PGI to build, the mpi example
>>> worked fine.
>>> (But have problem to build other software.)
>>>
>>> But with a new version (8.0.4), after the build, when I run the
>>> example
>>> program, I got seg. fault with w processors.
>>>
>>> Anyone knows what is wrong here?
>>>
>>> Thanks,
>>>
>>> Wei
>>>
>>> ----------------
>>>
>>> mpiexec -n 1 ./cpi
>>>
>>> Process 0 of 1 is on neem
>>> pi is approximately 3.1415926544231341, Error is 0.0000000008333410
>>> wall clock time = 0.000275
>>>
>>>
>>> mpiexec -n 2 ./cpi
>>> Process 1 of 2 is on neem
>>> Process 0 of 2 is on neem
>>> pi is approximately 3.1415926544231318, Error is 0.0000000008333387
>>> wall clock time = 0.000192
>>>
>>> =====================================================================================
>>> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>>> = EXIT CODE: 11
>>> = CLEANING UP REMAINING PROCESSES
>>> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>>> =====================================================================================
>>> APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault
>>> (signal 11)
>>>
>>>
>>> _______________________________________________
>>> mpich-discuss mailing list
>>> mpich-discuss at mcs.anl.gov
>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
More information about the mpich-discuss
mailing list