[mpich-discuss] mpi/openmp hybrid seg fault

Jack D. Galloway jackg at lanl.gov
Fri Dec 23 13:43:02 CST 2011


I added the "KMP_STACKSIZE=4G" as well as the same for OMP_STACKSIZE line to
my .bashrc file to make sure it is executed at each login instance.  In
addition, the "ulimit -s unlimited" line is in the .bashrc file and being
set, yet I get this error.  

I keep thinking this must be some problem that is not a problem whenever I'm
logged into the node (hence why it works whenever I launch a job while
logged it), but some settings aren't pushed out when running through MPICH2
since it fails across nodes ... and it seems to be tied to OMP stuff since
it works without OMP directives.  Stack size and OMP_STACKSIZE are both set
to either unilimited or "large" but failures still occur.  Also any ideas
what the fundamental difference between allocated arrays and static arrays
are that static arrays would kill it?  Seems like it is a "simple" problem
but I have no idea how to fix it.

Thanks.

-----Original Message-----
From: mpich-discuss-bounces at mcs.anl.gov
[mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Anthony Chan
Sent: Thursday, December 22, 2011 3:32 PM
To: mpich-discuss at mcs.anl.gov
Subject: Re: [mpich-discuss] mpi/openmp hybrid seg fault


Did you make sure both system and OpenMP runtime stack sizes are set on all
the nodes (not just the head node) ?

----- Original Message -----
> I tried setting both KMP_STACKSIZE and OMP_STACKSIZE to very large 
> values (4G then 8G) but still got the same seg fault.
> 
> I tried to compile with gfortran to verify but there are a lot of 
> "kind"
> variables that I believe are pinned to ifort (kmp_size_t_kind for 
> example).
> When compiling with "-heap-arrays" I still received the same error.
> 
> Just a debugging question, if the stacksize was the problem, would 
> that also show up when running only on one node (which currently works 
> as long as it's the same node as where mpiexec is executed from)? Any 
> other ideas?
> Stacksize is already unlimited:
> 
> galloway at tebow:~/Flow3D/hybrid-test$ ulimit -a core file size (blocks, 
> -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 
> file size (blocks, -f) unlimited pending signals (-i) 193087 max 
> locked memory (kbytes, -l) unlimited max memory size (kbytes, -m) 
> unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX 
> message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size 
> (kbytes, -s) unlimited cpu time (seconds, -t) unlimited max user 
> processes (-u) 193087 virtual memory (kbytes, -v) unlimited file locks 
> (-x) unlimited galloway at tebow:~/Flow3D/hybrid-test$
> 
> 
> 
> -----Original Message-----
> From: mpich-discuss-bounces at mcs.anl.gov 
> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Jeff Hammond
> Sent: Thursday, December 22, 2011 6:26 AM
> To: Anthony Chan; mpich-discuss at mcs.anl.gov
> Subject: Re: [mpich-discuss] mpi/openmp hybrid seg fault
> 
> The portable equivalent of KMP_STACKSIZE is OMP_STACKSIZE. We have 
> found on Blue Gene/P that when values of this parameter are too small, 
> segfaults occur without an obvious indication that this is the 
> problem. I assume this is not a platform-specific problem.
> 
> The Intel Fortran compiler puts all static arrays on the stack by 
> default.
> You can change this. See
> http://software.intel.com/en-us/articles/intel-fortran-compiler-increa
> sed-st ack-usage-of-80-or-higher-compilers-causes-segmentation-fault/
> for details, but I recommend you use "-heap-arrays". GNU Fortran
> (gfortran) does not have this problem, so recompiling with it is a 
> fast way to determine if stack allocation - independent of OpenMP - is 
> the problem.
> 
> Jeff
> 
> On Thu, Dec 22, 2011 at 1:22 AM, Anthony Chan <chan at mcs.anl.gov>
> wrote:
> >
> >
> > ----- Original Message -----
> >
> >> The fact that I need this to work for the much larger problem is 
> >> why simply changing static arrays to dynamic equivalents is not a 
> >> viable solution unfortunately.
> >
> > I didn't check your program, but based on your description of 
> > problem, you may want to increase the stack size of your threaded 
> > program.
> > Try setting "ulimit -s" and "KMP_STACKSIZE" to what your program's 
> > need. See ifort's manpage on KMP_STACKSIZE's definition.
> >
> > A.Chan
> > _______________________________________________
> > mpich-discuss mailing list mpich-discuss at mcs.anl.gov To manage 
> > subscription options or unsubscribe:
> > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> 
> 
> 
> --
> Jeff Hammond
> Argonne Leadership Computing Facility
> University of Chicago Computation Institute jhammond at alcf.anl.gov /
> (630)
> 252-5381 http://www.linkedin.com/in/jeffhammond
> https://wiki-old.alcf.anl.gov/index.php/User:Jhammond
> _______________________________________________
> mpich-discuss mailing list mpich-discuss at mcs.anl.gov To manage 
> subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> 
> _______________________________________________
> mpich-discuss mailing list mpich-discuss at mcs.anl.gov To manage 
> subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
_______________________________________________
mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
To manage subscription options or unsubscribe:
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss



More information about the mpich-discuss mailing list