[mpich-discuss] Socket error on Quad-core Windows XP
Jayesh Krishna
jayesh at mcs.anl.gov
Tue Apr 15 09:02:29 CDT 2008
Hi,
If you are seeing any error (even occasionally :) ) during MPI_Finalize()
it is a bug in MPICH2. Can you send us your MPI program ? How often do you
see the errors ?
Regards,
Jayesh
-----Original Message-----
From: Gib Bogle [mailto:g.bogle at auckland.ac.nz]
Sent: Monday, April 14, 2008 6:29 PM
To: Jayesh Krishna
Subject: Re: [mpich-discuss] Socket error on Quad-core Windows XP
Thanks. (Why didn't I do this before??) Almost no errors now. Very
occasionally I see "unable to read the cmd header on the left context,
socket connection closed" on MPI_FINALIZE(). No more 10093 errors.
Cheers
Gib
Jayesh Krishna wrote:
> Hi,
> Run your MPI program as "mpiexec -n 3 simple.exe" (Or to use all the
> 4 cores/procs - "mpiexec -n 4 simple.exe").
>
> Regards,
> Jayesh
> -----Original Message-----
> From: Gib Bogle [mailto:g.bogle at auckland.ac.nz]
> Sent: Monday, April 14, 2008 2:44 PM
> To: Jayesh Krishna
> Subject: RE: [mpich-discuss] Socket error on Quad-core Windows XP
>
> If you can tell me of another way to access the 4 cores on my machine,
> I'll try it.
>
> Gib
>
> Quoting Jayesh Krishna <jayesh at mcs.anl.gov>:
>
>> Hi,
>> Is there any reason you want to use the "-localonly" option ?
>>
>> (PS: The "closesocket()" errors that you see are due to a subtle bug
>> in the smpd state machine where a socket is closed twice. This should
>> not affect your program. We will fix this bug in the next release.
>> These messages show up since you are using the "-localonly" option.)
>>
>> Regards,
>> Jayesh
>> -----Original Message-----
>> From: owner-mpich-discuss at mcs.anl.gov
>> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Gib Bogle
>> Sent: Sunday, April 13, 2008 8:13 PM
>> To: mpich-discuss at mcs.anl.gov
>> Subject: Re: [mpich-discuss] Socket error on Quad-core Windows XP
>>
>> I am running on an Intel Core 2 Quad CPU Q6600 PC, under Windows XP.
>> I simply downloaded and installed the latest binary distribution for
>> Windows, mpich2-1.0.7-win32-ia32.msi. There was no configuration
>> that I
> noticed.
>> The process manager is smpd, which is started automatically.
>> The invocation of my program is:
>>
>> mpiexec -localonly 3 simple.exe
>>
>> A simple version of the code follows. I found a couple of
>> interesting things. There are many more 10093 errors with 3 processors
than with 4.
>> Uncommenting the deallocate statement in the main program seems to
>> eliminate
>> 10093 errors, but I still get the occasional 10058 error. It seems
>> that
>> MPICH2 gets upset if memory allocated after MPI_INIT() is not
>> deallocated before MPI_FINALIZE(). Note that errors occur
>> intermittently - not every run.
>>
>> Cheers
>> Gib
>>
>> Code:
>>
>> ! FILE: simple.f90
>> ! This exhibits socket errors
>>
>> module mpitest
>>
>> use mpi
>> IMPLICIT NONE
>>
>> integer, parameter :: NDATA = 100
>> integer, parameter :: NX = 50, NY = NX, NZ = NX
>>
>> type occupancy_type
>> integer :: cdata(NDATA)
>> end type
>>
>> type(occupancy_type), allocatable :: occupancy(:,:,:) integer :: me,
>> my_cell_type
>>
>> contains
>>
>> !--------------------------------------------------------------------
>> -
>> ------
>> --------------
>> !--------------------------------------------------------------------
>> -
>> ------
>> --------------
>> subroutine mpi_initialisation
>> integer :: size, ierr, status(MPI_STATUS_SIZE)
>>
>> CALL MPI_INIT(ierr)
>> CALL MPI_COMM_RANK( MPI_COMM_WORLD, me, ierr ) CALL MPI_COMM_SIZE(
>> MPI_COMM_WORLD, size, ierr ) end subroutine
>>
>> !--------------------------------------------------------------------
>> -
>> ------
>> --------------
>> !--------------------------------------------------------------------
>> -
>> ------
>> --------------
>> subroutine array_initialisation
>> integer :: x,y,z,k
>>
>> allocate(occupancy(NX,NY,NZ))
>>
>> k = 0
>> do x = 1,NX
>> do y = 1,NY
>> do z = 1,NZ
>> k = k+1
>> occupancy(x,y,z)%cdata = k
>> enddo
>> enddo
>> enddo
>> end subroutine
>>
>> end module
>>
>> !--------------------------------------------------------------------
>> -
>> ------
>> --------------
>> !--------------------------------------------------------------------
>> -
>> ------
>> --------------
>> PROGRAM simple
>>
>> use mpitest
>>
>> integer :: ierr
>>
>> call mpi_initialisation
>>
>> call array_initialisation
>>
>> call MPI_BARRIER ( MPI_COMM_WORLD, ierr )
>>
>> !deallocate(occupancy)
>>
>> write(*,*) 'MPI_FINALIZE: ',me
>>
>> CALL MPI_FINALIZE(ierr)
>>
>> END
>>
>>
>> Pavan Balaji wrote:
>>> Do you have a very simple (as simple as possible) program that
>>> demonstrates this? Also, can you give some more information about
>>> your installation --
>>>
>>> 1. Which version of MPICH2 are you using?
>>>
>>> 2. What configuration options were passed to MPICH2 during
>>> configuration
>>>
>>> 3. What process manager are you using?
>>>
>>> 4. What command line did you use to launch the process manager?
>>>
>>> 5. What command line did you use to launch the program?
>>>
>>> 6. Anything other information we should probably know about in your
>>> cluster, e.g., what OS, is there a firewall between the nodes, etc.
>>>
>>> -- Pavan
>>>
>>> On 04/09/2008 09:49 PM, Gib Bogle wrote:
>>>> My mpich-2 program seems to run correctly, but when it tries to
>>>> execute MPI_Finalize() it gives a range of error messages, all
>>>> apparently related to closing the socket connections. Typical
>>>> messages are:
>>>>
>>>> unable to read the cmd header on the pmi context, socket connection
>>>> closed
>>>>
>>>> shutdown failed, sock ####, error 10093
>>>>
>>>> closesocket failed, sock ####, error 10093
>>>>
>>>> So far I haven't seen any bad consequences from these errors, but
>>>> they are disconcerting. Should I care? Is there something I can do?
>>>>
>>>> Gib
>>>>
>>
>>
>
>
>
> ----------------------------------------------------------------
> This message was sent using IMP, the Internet Messaging Program.
>
>
More information about the mpich-discuss
mailing list