[mpich-discuss] mpi/openmp hybrid seg fault
Rajeev Thakur
thakur at mcs.anl.gov
Fri Dec 23 23:09:24 CST 2011
Try printing the environment variables from each rank to see if they have been set everywhere.
Rajeev
On Dec 23, 2011, at 1:43 PM, Jack D. Galloway wrote:
> I added the "KMP_STACKSIZE=4G" as well as the same for OMP_STACKSIZE line to
> my .bashrc file to make sure it is executed at each login instance. In
> addition, the "ulimit -s unlimited" line is in the .bashrc file and being
> set, yet I get this error.
>
> I keep thinking this must be some problem that is not a problem whenever I'm
> logged into the node (hence why it works whenever I launch a job while
> logged it), but some settings aren't pushed out when running through MPICH2
> since it fails across nodes ... and it seems to be tied to OMP stuff since
> it works without OMP directives. Stack size and OMP_STACKSIZE are both set
> to either unilimited or "large" but failures still occur. Also any ideas
> what the fundamental difference between allocated arrays and static arrays
> are that static arrays would kill it? Seems like it is a "simple" problem
> but I have no idea how to fix it.
>
> Thanks.
>
> -----Original Message-----
> From: mpich-discuss-bounces at mcs.anl.gov
> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Anthony Chan
> Sent: Thursday, December 22, 2011 3:32 PM
> To: mpich-discuss at mcs.anl.gov
> Subject: Re: [mpich-discuss] mpi/openmp hybrid seg fault
>
>
> Did you make sure both system and OpenMP runtime stack sizes are set on all
> the nodes (not just the head node) ?
>
> ----- Original Message -----
>> I tried setting both KMP_STACKSIZE and OMP_STACKSIZE to very large
>> values (4G then 8G) but still got the same seg fault.
>>
>> I tried to compile with gfortran to verify but there are a lot of
>> "kind"
>> variables that I believe are pinned to ifort (kmp_size_t_kind for
>> example).
>> When compiling with "-heap-arrays" I still received the same error.
>>
>> Just a debugging question, if the stacksize was the problem, would
>> that also show up when running only on one node (which currently works
>> as long as it's the same node as where mpiexec is executed from)? Any
>> other ideas?
>> Stacksize is already unlimited:
>>
>> galloway at tebow:~/Flow3D/hybrid-test$ ulimit -a core file size (blocks,
>> -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0
>> file size (blocks, -f) unlimited pending signals (-i) 193087 max
>> locked memory (kbytes, -l) unlimited max memory size (kbytes, -m)
>> unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX
>> message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size
>> (kbytes, -s) unlimited cpu time (seconds, -t) unlimited max user
>> processes (-u) 193087 virtual memory (kbytes, -v) unlimited file locks
>> (-x) unlimited galloway at tebow:~/Flow3D/hybrid-test$
>>
>>
>>
>> -----Original Message-----
>> From: mpich-discuss-bounces at mcs.anl.gov
>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Jeff Hammond
>> Sent: Thursday, December 22, 2011 6:26 AM
>> To: Anthony Chan; mpich-discuss at mcs.anl.gov
>> Subject: Re: [mpich-discuss] mpi/openmp hybrid seg fault
>>
>> The portable equivalent of KMP_STACKSIZE is OMP_STACKSIZE. We have
>> found on Blue Gene/P that when values of this parameter are too small,
>> segfaults occur without an obvious indication that this is the
>> problem. I assume this is not a platform-specific problem.
>>
>> The Intel Fortran compiler puts all static arrays on the stack by
>> default.
>> You can change this. See
>> http://software.intel.com/en-us/articles/intel-fortran-compiler-increa
>> sed-st ack-usage-of-80-or-higher-compilers-causes-segmentation-fault/
>> for details, but I recommend you use "-heap-arrays". GNU Fortran
>> (gfortran) does not have this problem, so recompiling with it is a
>> fast way to determine if stack allocation - independent of OpenMP - is
>> the problem.
>>
>> Jeff
>>
>> On Thu, Dec 22, 2011 at 1:22 AM, Anthony Chan <chan at mcs.anl.gov>
>> wrote:
>>>
>>>
>>> ----- Original Message -----
>>>
>>>> The fact that I need this to work for the much larger problem is
>>>> why simply changing static arrays to dynamic equivalents is not a
>>>> viable solution unfortunately.
>>>
>>> I didn't check your program, but based on your description of
>>> problem, you may want to increase the stack size of your threaded
>>> program.
>>> Try setting "ulimit -s" and "KMP_STACKSIZE" to what your program's
>>> need. See ifort's manpage on KMP_STACKSIZE's definition.
>>>
>>> A.Chan
>>> _______________________________________________
>>> mpich-discuss mailing list mpich-discuss at mcs.anl.gov To manage
>>> subscription options or unsubscribe:
>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>
>>
>>
>> --
>> Jeff Hammond
>> Argonne Leadership Computing Facility
>> University of Chicago Computation Institute jhammond at alcf.anl.gov /
>> (630)
>> 252-5381 http://www.linkedin.com/in/jeffhammond
>> https://wiki-old.alcf.anl.gov/index.php/User:Jhammond
>> _______________________________________________
>> mpich-discuss mailing list mpich-discuss at mcs.anl.gov To manage
>> subscription options or unsubscribe:
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>
>> _______________________________________________
>> mpich-discuss mailing list mpich-discuss at mcs.anl.gov To manage
>> subscription options or unsubscribe:
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> _______________________________________________
> mpich-discuss mailing list mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
> _______________________________________________
> mpich-discuss mailing list mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
More information about the mpich-discuss
mailing list