[MPICH] mpich2 problem with MCNP5

David Ashton ashton at mcs.anl.gov
Mon Jul 25 19:42:05 CDT 2005


bastian,

 

The error for MPICH2 states that there is a stack overflow.  I suspect this
is a problem with the way the fortran application was compiled.  I don’t
remember the details off the top of my head but I believe if you tell the
compiler to increase the amount of global memory so you can have a large
static array for instance, then this also affects the amount of memory
available for the thread stacks.  The only way to resolve it is to use
memory off the heap instead of large static variables and not increase the
global memory space.

 

I may be way off my diagnosis.  What parameters to the compiler or #pragmas
did you use to compile the application?

 

-David Ashton

 

  _____  

From: owner-mpich-discuss at mcs.anl.gov
[mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Vogt, Bastian
Sent: Monday, July 25, 2005 5:27 AM
To: mpich-discuss at mcs.anl.gov
Subject: [MPICH] mpich2 problem with MCNP5

 

Dear colleagues,

 

I try to do some core calculations using the MPI parallel option of MCNP5.
Therefore I have a input file with about more or less 70.000 cells.

It is running without any problems on my local machine using about 600MB of
RAM. 

When I try to execute the same Job with MPI on one our more machines I get
the following error:

 

 

With mpich:

cp0 =   4.21

forrtl: severe (170): Program Exception - stack overflow

 

Image              PC        Routine            Line        Source


MCNP5mpi.exe       005AAD1A  Unknown               Unknown  Unknown

MCNP5mpi.exe       005A07DB  Unknown               Unknown  Unknown

MCNP5mpi.exe       004DE9AF  Unknown               Unknown  Unknown

MCNP5mpi.exe       004D2714  Unknown               Unknown  Unknown

MCNP5mpi.exe       005F7699  Unknown               Unknown  Unknown

MCNP5mpi.exe       005E9EBA  Unknown               Unknown  Unknown

kernel32.dll             7C816D4F  Unknown               Unknown  Unknown

Error 64, process 1, host IKET127154:

   GetQueuedCompletionStatus failed for socket 0 connected to host
'141.52.127.137'

 

 

 

 

With mpich2:

cp0 =   4.23

forrtl: severe (170): Program Exception - stack overflow Image
PC        Routine            Line        Source             

MCNP5mpi.exe       005AAD1A  Unknown               Unknown  Unknown

MCNP5mpi.exe       005A07DB  Unknown               Unknown  Unknown

MCNP5mpi.exe       004DE9AF  Unknown               Unknown  Unknown

MCNP5mpi.exe       004D2714  Unknown               Unknown  Unknown

MCNP5mpi.exe       005F7699  Unknown               Unknown  Unknown

MCNP5mpi.exe       005E9EBA  Unknown               Unknown  Unknown

kernel32.dll       7C816D4F  Unknown               Unknown  Unknown

 

I tried to run a smaller job on the cluster and its working fine. So it
should not be a problem with a wrong configured MPI. All computers in the
cluster have about 1GB of RAM, so it should not be the RAM either. I don’t
have any idea to overcome this problem so I would be glad about some help!!

 

Has anyone an idea to fix this problem?

 

Thank you all in advance!!

bastian

 

_______________________________________

 

Bastian Vogt

 

EnBW Kraftwerke AG 

 

Institut für Kern- und Energietechnik (IKET)
Forschungszentrum Karlsruhe 
Telefon: +49-(0)7247-82-5047
Fax.: +49-(0)7247-82-6323

 

 

_______________________________________

 

Bastian Vogt

 

EnBW Kraftwerke AG 

 

Institut für Kern- und Energietechnik (IKET)
Forschungszentrum Karlsruhe 
Telefon: +49-(0)7247-82-5047
Fax.: +49-(0)7247-82-6323

 

mailto: bastian.vogt at iket.fzk.de

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20050725/1bfaff01/attachment.htm>


More information about the mpich-discuss mailing list