[mpich-discuss] Socket error on Quad-core Windows XP

Jayesh Krishna jayesh at mcs.anl.gov
Mon Apr 14 09:10:13 CDT 2008


 Hi,
  Is there any reason you want to use the "-localonly" option ?

(PS: The "closesocket()" errors that you see are due to a subtle bug in the
smpd state machine where a socket is closed twice. This should not affect
your program. We will fix this bug in the next release. These messages show
up since you are using the "-localonly" option.)

Regards,
Jayesh
-----Original Message-----
From: owner-mpich-discuss at mcs.anl.gov
[mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Gib Bogle
Sent: Sunday, April 13, 2008 8:13 PM
To: mpich-discuss at mcs.anl.gov
Subject: Re: [mpich-discuss] Socket error on Quad-core Windows XP

I am running on an Intel Core 2 Quad CPU Q6600 PC, under Windows XP.  I
simply downloaded and installed the latest binary distribution for Windows,
mpich2-1.0.7-win32-ia32.msi.  There was no configuration that I noticed.
The process manager is smpd, which is started automatically. 
The invocation of my program is:

mpiexec -localonly 3 simple.exe

A simple version of the code follows.  I found a couple of interesting
things.  There are many more 10093 errors with 3 processors than with 4.
Uncommenting the deallocate statement in the main program seems to eliminate
10093 errors, but I still get the occasional 10058 error.  It seems that
MPICH2 gets upset if memory allocated after MPI_INIT() is not deallocated
before MPI_FINALIZE().  Note that errors occur intermittently - not every
run.

Cheers
Gib

Code:

! FILE: simple.f90
! This exhibits socket errors

module mpitest

use mpi
IMPLICIT NONE

integer, parameter :: NDATA = 100
integer, parameter :: NX = 50, NY = NX, NZ = NX

type occupancy_type
     integer :: cdata(NDATA)
end type

type(occupancy_type), allocatable :: occupancy(:,:,:) integer :: me,
my_cell_type

contains

!---------------------------------------------------------------------------
--------------
!---------------------------------------------------------------------------
--------------
subroutine mpi_initialisation
integer :: size, ierr, status(MPI_STATUS_SIZE)

CALL MPI_INIT(ierr)
CALL MPI_COMM_RANK( MPI_COMM_WORLD, me, ierr ) CALL MPI_COMM_SIZE(
MPI_COMM_WORLD, size, ierr ) end subroutine

!---------------------------------------------------------------------------
--------------
!---------------------------------------------------------------------------
--------------
subroutine array_initialisation
integer :: x,y,z,k

allocate(occupancy(NX,NY,NZ))

k = 0
do x = 1,NX
     do y = 1,NY
		do z = 1,NZ
	        k = k+1
             occupancy(x,y,z)%cdata = k
		enddo
     enddo
enddo
end subroutine

end module

!---------------------------------------------------------------------------
--------------
!---------------------------------------------------------------------------
--------------
PROGRAM simple

use mpitest

integer :: ierr

call mpi_initialisation

call array_initialisation

call MPI_BARRIER ( MPI_COMM_WORLD, ierr )

!deallocate(occupancy)

write(*,*) 'MPI_FINALIZE: ',me

CALL MPI_FINALIZE(ierr)

END


Pavan Balaji wrote:
> 
> Do you have a very simple (as simple as possible) program that 
> demonstrates this? Also, can you give some more information about your 
> installation --
> 
> 1. Which version of MPICH2 are you using?
> 
> 2. What configuration options were passed to MPICH2 during 
> configuration
> 
> 3. What process manager are you using?
> 
> 4. What command line did you use to launch the process manager?
> 
> 5. What command line did you use to launch the program?
> 
> 6. Anything other information we should probably know about in your 
> cluster, e.g., what OS, is there a firewall between the nodes, etc.
> 
>  -- Pavan
> 
> On 04/09/2008 09:49 PM, Gib Bogle wrote:
>> My mpich-2 program seems to run correctly, but when it tries to 
>> execute MPI_Finalize() it gives a range of error messages, all 
>> apparently related to closing the socket connections.  Typical 
>> messages are:
>>
>> unable to read the cmd header on the pmi context, socket connection 
>> closed
>>
>> shutdown failed, sock ####, error 10093
>>
>> closesocket failed, sock ####, error 10093
>>
>> So far I haven't seen any bad consequences from these errors, but 
>> they are disconcerting.  Should I care?  Is there something I can do?
>>
>> Gib
>>
> 





More information about the mpich-discuss mailing list