[mpich-discuss] Issue running MCNPX on small cluster: Error sending static commons.

Matthew Riblett riblem at rpi.edu
Wed Jun 20 14:18:52 CDT 2012


Hello,

I am attempting to run MCNPX in an MPI environment on a small cluster of computers (Dell PowerEdge servers running 64-bit Windows Server 2008 Standard).
I am using the precompiled 64-bit MPI executables from RSICC. 
I've had success running the process on each of four test servers when configured to run on only one host and can escalate to run multiple processes on single hosts.  
When I attempt to run the program across multiple hosts (ex: -hosts 4 Mercury-1 Mercury-2 Mercury-3 Mercury-4) it returns a fatal error:

master starting 3 by 1 subtasks 06/20/12 15:06:29
master sending static commons...
Fatal error in MPI_Send: Other MPI error, error stack
MPI_Send(173)................: MPI_Send(buf=0000000020E00000, count=236236, MPI_PACKED, dest=1, tag=4 MPI_COMM_WORLD) failed
MPIDI_CH3I_Progress(402)........:
MPID_nem_mpich2_blocking_recv(905)...:
MPID_nem_newtcp_module_poll(37)......:
MPID_nem_newtcp_module_connpoll(2656):
gen_cnting_fail_handler(1739)........: connect failed - the semaphore timeout period has expired (errno 121)

job aborted: 
rank: node: exit code[: error message]
0: Mercury-1: 1: process 0 exited without calling finalize
1: Mercury-2: 123
2: Mercury-3: 123
3: Mercury-4: 123

I've looked at several of the archived posts that seemed to have similar problems, such as http://lists.mcs.anl.gov/pipermail/mpich-discuss/2011-August/010696.html.
In each case they passed the static commons sending point and got to the point where the program was sending dynamic commons.

This is a rather large simulation ~600Mb and I was curious as to whether or not its size may be playing a role in this error.
Running the cpi.exe example, the hosts communicate with one another and there is no problem in execution.

I don't think this is a firewall issue as both smpd.exe and mpiexec.exe are granted exceptions in the Windows Firewall.

Thanks in advance,

-- Matt
___
Matthew J. Riblett
Nuclear Engineering Class '12
Rensselaer Polytechnic Institute
Rensselaer Radiation Measurement and Dosimetry Group
American Nuclear Society, Section President
MANE Department Student Advisory Council

Email:    riblem at rpi.edu
Main:     +1.646.843.9596
Mobile:  +1.804.245.0578
Web:      http://riblem.rpians.org





-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20120620/e8303813/attachment.html>


More information about the mpich-discuss mailing list