AW: [MPICH] mpich2 problem with MCNP5

Vogt, Bastian Bastian.Vogt at iket.fzk.de
Mon Aug 1 03:57:32 CDT 2005


Hi David,
 
in fact I did not compile the application since MCNP5 comes with a precompiled MPI executable!
But I fixed the problem by patching the MCNP5_MPI.EXE with the editbin tool (MICROSOFT VISUAL STUDIO) to a bigger stack size.
 
Now a new problem is showing up:
When MPI initializes the slave processes I  get the error message:
 
master starting       4 tasks with       1 threads each  07/29/05 16:03:38 
 master sending static commons...
 master sending dynamic commons...
[pe:0] **DOTCOMM Error** DOTCOMMI_PACK (C:\mcnp\Install_RSICC_1.30\MCNP5\Source\dotcomm\src\internals\mpi\dotcommi_pack.c:104[!(( dotcommp_sbuf.data != ((void *)0) ))]) (alloc(dotcommp_sbuf.data)) (-280776228)
 
I figured out that this routine "dotcommi_pack.c" allocates the size of the send buffer size. So this is a file from the MCNP installation and not a MPICH file. But anyway, if you have any suggestion to fix the problem, please send me an e-mail.
 
Thanks in advance
 
bastian
_______________________________________
 
Bastian Vogt
 
EnBW Kraftwerke AG 
 
Institut für Kern- und Energietechnik (IKET)
Forschungszentrum Karlsruhe 
Telefon: +49-(0)7247-82-5047
Fax.: +49-(0)7247-82-6323
 
mailto: bastian.vogt at iket.fzk.de
-----Ursprüngliche Nachricht-----
Von: David Ashton [mailto:ashton at mcs.anl.gov] 
Gesendet: Dienstag, 26. Juli 2005 02:42
An: Vogt, Bastian; mpich-discuss at mcs.anl.gov
Betreff: RE: [MPICH] mpich2 problem with MCNP5
 
bastian,
 
The error for MPICH2 states that there is a stack overflow.  I suspect this is a problem with the way the fortran application was compiled.  I don't remember the details off the top of my head but I believe if you tell the compiler to increase the amount of global memory so you can have a large static array for instance, then this also affects the amount of memory available for the thread stacks.  The only way to resolve it is to use memory off the heap instead of large static variables and not increase the global memory space.
 
I may be way off my diagnosis.  What parameters to the compiler or #pragmas did you use to compile the application?
 
-David Ashton
 
  _____  

From: owner-mpich-discuss at mcs.anl.gov [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Vogt, Bastian
Sent: Monday, July 25, 2005 5:27 AM
To: mpich-discuss at mcs.anl.gov
Subject: [MPICH] mpich2 problem with MCNP5
 
Dear colleagues,
 
I try to do some core calculations using the MPI parallel option of MCNP5. Therefore I have a input file with about more or less 70.000 cells.
It is running without any problems on my local machine using about 600MB of RAM. 
When I try to execute the same Job with MPI on one our more machines I get the following error:
 
 
With mpich:
cp0 =   4.21
forrtl: severe (170): Program Exception - stack overflow
 
Image              PC        Routine            Line        Source             
MCNP5mpi.exe       005AAD1A  Unknown               Unknown  Unknown
MCNP5mpi.exe       005A07DB  Unknown               Unknown  Unknown
MCNP5mpi.exe       004DE9AF  Unknown               Unknown  Unknown
MCNP5mpi.exe       004D2714  Unknown               Unknown  Unknown
MCNP5mpi.exe       005F7699  Unknown               Unknown  Unknown
MCNP5mpi.exe       005E9EBA  Unknown               Unknown  Unknown
kernel32.dll             7C816D4F  Unknown               Unknown  Unknown
Error 64, process 1, host IKET127154:
   GetQueuedCompletionStatus failed for socket 0 connected to host '141.52.127.137'
 
 
 
 
With mpich2:
cp0 =   4.23
forrtl: severe (170): Program Exception - stack overflow Image              PC        Routine            Line        Source             
MCNP5mpi.exe       005AAD1A  Unknown               Unknown  Unknown
MCNP5mpi.exe       005A07DB  Unknown               Unknown  Unknown
MCNP5mpi.exe       004DE9AF  Unknown               Unknown  Unknown
MCNP5mpi.exe       004D2714  Unknown               Unknown  Unknown
MCNP5mpi.exe       005F7699  Unknown               Unknown  Unknown
MCNP5mpi.exe       005E9EBA  Unknown               Unknown  Unknown
kernel32.dll       7C816D4F  Unknown               Unknown  Unknown
 
I tried to run a smaller job on the cluster and its working fine. So it should not be a problem with a wrong configured MPI. All computers in the cluster have about 1GB of RAM, so it should not be the RAM either. I don't have any idea to overcome this problem so I would be glad about some help!!
 
Has anyone an idea to fix this problem?
 
Thank you all in advance!!
bastian
 
_______________________________________
 
Bastian Vogt
 
EnBW Kraftwerke AG 
 
Institut für Kern- und Energietechnik (IKET)
Forschungszentrum Karlsruhe 
Telefon: +49-(0)7247-82-5047
Fax.: +49-(0)7247-82-6323
 
 
_______________________________________
 
Bastian Vogt
 
EnBW Kraftwerke AG 
 
Institut für Kern- und Energietechnik (IKET)
Forschungszentrum Karlsruhe 
Telefon: +49-(0)7247-82-5047
Fax.: +49-(0)7247-82-6323
 
mailto: bastian.vogt at iket.fzk.de
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20050801/85379b86/attachment.htm>


More information about the mpich-discuss mailing list