[mpich-discuss] mpi/openmp hybrid seg fault

Rajeev Thakur thakur at mcs.anl.gov
Fri Dec 23 23:09:24 CST 2011


Try printing the environment variables from each rank to see if they have been set everywhere.

Rajeev 


On Dec 23, 2011, at 1:43 PM, Jack D. Galloway wrote:

> I added the "KMP_STACKSIZE=4G" as well as the same for OMP_STACKSIZE line to
> my .bashrc file to make sure it is executed at each login instance.  In
> addition, the "ulimit -s unlimited" line is in the .bashrc file and being
> set, yet I get this error.  
> 
> I keep thinking this must be some problem that is not a problem whenever I'm
> logged into the node (hence why it works whenever I launch a job while
> logged it), but some settings aren't pushed out when running through MPICH2
> since it fails across nodes ... and it seems to be tied to OMP stuff since
> it works without OMP directives.  Stack size and OMP_STACKSIZE are both set
> to either unilimited or "large" but failures still occur.  Also any ideas
> what the fundamental difference between allocated arrays and static arrays
> are that static arrays would kill it?  Seems like it is a "simple" problem
> but I have no idea how to fix it.
> 
> Thanks.
> 
> -----Original Message-----
> From: mpich-discuss-bounces at mcs.anl.gov
> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Anthony Chan
> Sent: Thursday, December 22, 2011 3:32 PM
> To: mpich-discuss at mcs.anl.gov
> Subject: Re: [mpich-discuss] mpi/openmp hybrid seg fault
> 
> 
> Did you make sure both system and OpenMP runtime stack sizes are set on all
> the nodes (not just the head node) ?
> 
> ----- Original Message -----
>> I tried setting both KMP_STACKSIZE and OMP_STACKSIZE to very large 
>> values (4G then 8G) but still got the same seg fault.
>> 
>> I tried to compile with gfortran to verify but there are a lot of 
>> "kind"
>> variables that I believe are pinned to ifort (kmp_size_t_kind for 
>> example).
>> When compiling with "-heap-arrays" I still received the same error.
>> 
>> Just a debugging question, if the stacksize was the problem, would 
>> that also show up when running only on one node (which currently works 
>> as long as it's the same node as where mpiexec is executed from)? Any 
>> other ideas?
>> Stacksize is already unlimited:
>> 
>> galloway at tebow:~/Flow3D/hybrid-test$ ulimit -a core file size (blocks, 
>> -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 
>> file size (blocks, -f) unlimited pending signals (-i) 193087 max 
>> locked memory (kbytes, -l) unlimited max memory size (kbytes, -m) 
>> unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX 
>> message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size 
>> (kbytes, -s) unlimited cpu time (seconds, -t) unlimited max user 
>> processes (-u) 193087 virtual memory (kbytes, -v) unlimited file locks 
>> (-x) unlimited galloway at tebow:~/Flow3D/hybrid-test$
>> 
>> 
>> 
>> -----Original Message-----
>> From: mpich-discuss-bounces at mcs.anl.gov 
>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Jeff Hammond
>> Sent: Thursday, December 22, 2011 6:26 AM
>> To: Anthony Chan; mpich-discuss at mcs.anl.gov
>> Subject: Re: [mpich-discuss] mpi/openmp hybrid seg fault
>> 
>> The portable equivalent of KMP_STACKSIZE is OMP_STACKSIZE. We have 
>> found on Blue Gene/P that when values of this parameter are too small, 
>> segfaults occur without an obvious indication that this is the 
>> problem. I assume this is not a platform-specific problem.
>> 
>> The Intel Fortran compiler puts all static arrays on the stack by 
>> default.
>> You can change this. See
>> http://software.intel.com/en-us/articles/intel-fortran-compiler-increa
>> sed-st ack-usage-of-80-or-higher-compilers-causes-segmentation-fault/
>> for details, but I recommend you use "-heap-arrays". GNU Fortran
>> (gfortran) does not have this problem, so recompiling with it is a 
>> fast way to determine if stack allocation - independent of OpenMP - is 
>> the problem.
>> 
>> Jeff
>> 
>> On Thu, Dec 22, 2011 at 1:22 AM, Anthony Chan <chan at mcs.anl.gov>
>> wrote:
>>> 
>>> 
>>> ----- Original Message -----
>>> 
>>>> The fact that I need this to work for the much larger problem is 
>>>> why simply changing static arrays to dynamic equivalents is not a 
>>>> viable solution unfortunately.
>>> 
>>> I didn't check your program, but based on your description of 
>>> problem, you may want to increase the stack size of your threaded 
>>> program.
>>> Try setting "ulimit -s" and "KMP_STACKSIZE" to what your program's 
>>> need. See ifort's manpage on KMP_STACKSIZE's definition.
>>> 
>>> A.Chan
>>> _______________________________________________
>>> mpich-discuss mailing list mpich-discuss at mcs.anl.gov To manage 
>>> subscription options or unsubscribe:
>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>> 
>> 
>> 
>> --
>> Jeff Hammond
>> Argonne Leadership Computing Facility
>> University of Chicago Computation Institute jhammond at alcf.anl.gov /
>> (630)
>> 252-5381 http://www.linkedin.com/in/jeffhammond
>> https://wiki-old.alcf.anl.gov/index.php/User:Jhammond
>> _______________________________________________
>> mpich-discuss mailing list mpich-discuss at mcs.anl.gov To manage 
>> subscription options or unsubscribe:
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>> 
>> _______________________________________________
>> mpich-discuss mailing list mpich-discuss at mcs.anl.gov To manage 
>> subscription options or unsubscribe:
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> _______________________________________________
> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> 
> _______________________________________________
> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss



More information about the mpich-discuss mailing list