[mpich-discuss] mpich2 hangs on Ubuntu beowulf cluster(with NFS) with patch

Darius Buntinas buntinas at mcs.anl.gov
Mon Jan 23 11:37:41 CST 2012


Did the program get further in the run after you applied the patch than before?

To see where the segfault occurred, enable core dumps, then get a stack trace from a debugger:

  * Add "ulimit -c unlimited" to your .bashrc (assuming you're using bash).
  * Run your code again.  When you get a segfault, you should get one or more core.XXXX files (where XXXX is the process id).
  * Open the core file in gdb: "gdb appname core.XXXX" where "appname" is the name of the executable.
  * In gdb, type "bt" to get the stack trace and send us the output.

-d


On Jan 23, 2012, at 10:49 AM, Gustavo Correa wrote:

> Hi Konstantinos
> 
> It sounds as an ocean model code ...
> Sometimes signal 11, segmentation fault, happens because of a small stack size.
> Did you set the stack size on your compute nodes to a large value, or to unlimited?
> You may need to ask the sys admin to change it in /etc/security/limits.conf.
> 
> I hope this helps,
> Gus Correa
> 
> On Jan 23, 2012, at 6:49 AM, Konstantinos Varotsos wrote:
> 
>> 
>> 
>> Hi there I rebuilt mpich2 with the patch.
>> 
>> 
>> Now when running the code I get
>> 
>> [0]  ***** SCRATCH RUN *****
>> [0]
>> [0]
>> [0]            WT =  0.2400E+00,  U* =  0.9615E+00,  L = -0.2832E+03
>> [0]            DTDZ FREE =  0.3000E-02,  ZODY=  0.4176E+01
>> [0]            ZO(BTM) =  0.1600E+00,  CDBTM=  0.9175E-02,  UG =   0.1000E+02
>> [0]            NNX =    96,  NNY =    96,  NNZ =    96
>> [0]            SFC SMLT = 1,  FILTER = 0,  ITI =      0,  ITMAX =    100
>> [0]            IUPWIND = 1,  BUYNCY = 1,  NO ALISNG = 1,  ITCUT = 1
>> [0]            DT =    0.000000E+00,  ZO =    0.160000E+00,  TS =    0.301928E+03,  SUBSD = 0
>> [0]            BRCLICITY = 0,  METHOD = 0,  IOCEAN = 0,  IVIS = 0
>> [7]  size of stats array =     4050
>> [0]            DSL =    0.502859E+02
>> [0]  Search for zi above the height =    0.300000E+02
>> [0]  iz_min =     1
>> 
>> =====================================================================================
>> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>> =   EXIT CODE: 11
>> =   CLEANING UP REMAINING PROCESSES
>> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>> =====================================================================================
>> 
>> 
>> Thanx Kwstas
>> 
>> mpich2version
>> 
>> MPICH2 Version:        1.4.1p1
>> MPICH2 Release date:    Thu Sep  1 13:53:02 CDT 2011
>> MPICH2 Device:        ch3:nemesis
>> MPICH2 configure:     --prefix=/mirror/mpiuser/mpich2-install
>> MPICH2 CC:     gcc    -O2
>> MPICH2 CXX:     c++   -O2
>> MPICH2 F77:     ifort   -O2
>> MPICH2 FC:     ifort   -O2
>> 
>> _______________________________________________
>> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
>> To manage subscription options or unsubscribe:
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> 
> _______________________________________________
> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss



More information about the mpich-discuss mailing list