[MPICH] mpich2 problem with MCNP5
David Ashton
ashton at mcs.anl.gov
Mon Jul 25 19:42:05 CDT 2005
bastian,
The error for MPICH2 states that there is a stack overflow. I suspect this
is a problem with the way the fortran application was compiled. I dont
remember the details off the top of my head but I believe if you tell the
compiler to increase the amount of global memory so you can have a large
static array for instance, then this also affects the amount of memory
available for the thread stacks. The only way to resolve it is to use
memory off the heap instead of large static variables and not increase the
global memory space.
I may be way off my diagnosis. What parameters to the compiler or #pragmas
did you use to compile the application?
-David Ashton
_____
From: owner-mpich-discuss at mcs.anl.gov
[mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Vogt, Bastian
Sent: Monday, July 25, 2005 5:27 AM
To: mpich-discuss at mcs.anl.gov
Subject: [MPICH] mpich2 problem with MCNP5
Dear colleagues,
I try to do some core calculations using the MPI parallel option of MCNP5.
Therefore I have a input file with about more or less 70.000 cells.
It is running without any problems on my local machine using about 600MB of
RAM.
When I try to execute the same Job with MPI on one our more machines I get
the following error:
With mpich:
cp0 = 4.21
forrtl: severe (170): Program Exception - stack overflow
Image PC Routine Line Source
MCNP5mpi.exe 005AAD1A Unknown Unknown Unknown
MCNP5mpi.exe 005A07DB Unknown Unknown Unknown
MCNP5mpi.exe 004DE9AF Unknown Unknown Unknown
MCNP5mpi.exe 004D2714 Unknown Unknown Unknown
MCNP5mpi.exe 005F7699 Unknown Unknown Unknown
MCNP5mpi.exe 005E9EBA Unknown Unknown Unknown
kernel32.dll 7C816D4F Unknown Unknown Unknown
Error 64, process 1, host IKET127154:
GetQueuedCompletionStatus failed for socket 0 connected to host
'141.52.127.137'
With mpich2:
cp0 = 4.23
forrtl: severe (170): Program Exception - stack overflow Image
PC Routine Line Source
MCNP5mpi.exe 005AAD1A Unknown Unknown Unknown
MCNP5mpi.exe 005A07DB Unknown Unknown Unknown
MCNP5mpi.exe 004DE9AF Unknown Unknown Unknown
MCNP5mpi.exe 004D2714 Unknown Unknown Unknown
MCNP5mpi.exe 005F7699 Unknown Unknown Unknown
MCNP5mpi.exe 005E9EBA Unknown Unknown Unknown
kernel32.dll 7C816D4F Unknown Unknown Unknown
I tried to run a smaller job on the cluster and its working fine. So it
should not be a problem with a wrong configured MPI. All computers in the
cluster have about 1GB of RAM, so it should not be the RAM either. I dont
have any idea to overcome this problem so I would be glad about some help!!
Has anyone an idea to fix this problem?
Thank you all in advance!!
bastian
_______________________________________
Bastian Vogt
EnBW Kraftwerke AG
Institut für Kern- und Energietechnik (IKET)
Forschungszentrum Karlsruhe
Telefon: +49-(0)7247-82-5047
Fax.: +49-(0)7247-82-6323
_______________________________________
Bastian Vogt
EnBW Kraftwerke AG
Institut für Kern- und Energietechnik (IKET)
Forschungszentrum Karlsruhe
Telefon: +49-(0)7247-82-5047
Fax.: +49-(0)7247-82-6323
mailto: bastian.vogt at iket.fzk.de
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20050725/1bfaff01/attachment.htm>
More information about the mpich-discuss
mailing list