[mpich-discuss] Dual core problem

huraj at ucm.sk huraj at ucm.sk
Wed Apr 14 02:45:50 CDT 2010


Hi,

I experimented with the smpd -d mode and it seems to be no problem to run
my projects in this mode.
I tried my program hello as well and it worked without errors.

But when I restarted smpd in normal mode the errors are there again.

So the solution for me is to use the smpd in debug mode (I do not know why
but it works).

See the attached outputs from

Thank you very much

BR,
Ladislav


> Hi,
>  Please provide us the following logs,
>
> # Stop any running instances of smpd using the command "smpd -stop".
> # Start smpd in debug mode and collect the debug log using the command,
> "smpd -d > smpd.log"
> # Launch your job in debug mode and collect the debug log using the
> command, "mpiexec -verbose -n 2 cpi.exe > mpiexec.log"
> # Stop smpd in debug mode (Ctrl-C)
> # restart smpd using the command, "smpd -start"
> # Provide us with mpiexec.log and smpd.log (and any error output)
>
> Regards,
> Jayesh
> ----- Original Message -----
> From: "Ladislav Huraj" <ladislav.huraj at ucm.sk>
> To: "Jayesh Krishna" <jayesh at mcs.anl.gov>
> Cc: mpich-discuss at mcs.anl.gov
> Sent: Monday, April 12, 2010 2:35:47 AM GMT -06:00 US/Canada Central
> Subject: Re: [mpich-discuss] Dual core problem
>
> Hi,
> I am sure my version of windows is 32-bit, I installed it and I checked
> it as well.
> Is it possible to set that mpiexec (or smpd) is running only on one core?
>
> Thanks
>
> Regards,
>
> Ladislav Huraj
>
>
> Jayesh Krishna  wrote / napĂ­sal(a):
>> Hi,
>>  Are you sure that you have 32-bit version of XP (and not 64-bit)
>> installed in your system ?
>>
>> (PS: AFAIK, Core 2 duos support both 32-bit and 64-bit versions of
>> windows)
>> Regards,
>> Jayesh
>> ----- Original Message -----
>> From: huraj at ucm.sk
>> To: jayesh at mcs.anl.gov
>> Cc: mpich-discuss at mcs.anl.gov
>> Sent: Saturday, April 10, 2010 2:04:13 PM GMT -06:00 US/Canada Central
>> Subject: Re: [mpich-discuss] Dual core problem
>>
>> Hi,
>>
>>
>>> 1) Uninstall MPICH2 on your machine.
>>> 2) Install mpich2-1.2.1p1 on your machine (Make sure that your OS is
>>>
>> 32-bit before installing 32-bit version of MPICH2 - Start->Control
>> Panel->System)
>>
>> This did not help, the same results.
>>
>>
>>> 3) Re-compile C:\program files\mpich2\examples\icpi.c (cpi.vcproj)
>>>
>> Done
>>
>>
>>> 4) Run "smpd -status" to get the status of smpd
>>>
>> "smpd running on oo7note"
>> oo7note is name of my notebook
>>
>>
>>> 5) Try running cpi.exe as "c:\progra~1\mpich2\bin\mpiexec.exe -n 2
>>>
>> c:\progra~1\mpich2\examples\cpi.exe"
>>
>> Unfortunately, the same results like before; only in one of 6 cases the
>> program runs correctly.
>>
>>
>>> 6) Run "smpd -version" to get the version of the process manager.
>>>
>>
>> "1.2.1p1"
>>
>>
>>> 7) Type "winver" at the command prompt to get the complete version of
>>>
>> your OS.
>>
>> "Microsoft Windows
>> Version 5.1 Service Pack 3 (2600.xpsp_sp3_gdr.091208-2036)"
>>
>> I do not know what else I could try.
>>
>> Regards,
>> Ladislav
>>
>>
>>
>>> Hi,
>>>  I would recommend the following (To make sure that you installation is
>>> fine),
>>>
>>> 1) Uninstall MPICH2 on your machine.
>>> 2) Install mpich2-1.2.1p1 on your machine (Make sure that your OS is
>>> 32-bit before installing 32-bit version of MPICH2 - Start->Control
>>> Panel->System)
>>> 3) Re-compile C:\program files\mpich2\examples\icpi.c (cpi.vcproj)
>>> 4) Run "smpd -status" to get the status of smpd
>>> 5) Try running cpi.exe as "c:\progra~1\mpich2\bin\mpiexec.exe -n 2
>>> c:\progra~1\mpich2\examples\cpi.exe"
>>> 6) Run "smpd -version" to get the version of the process manager.
>>> 7) Type "winver" at the command prompt to get the complete version of
>>> your
>>> OS.
>>>
>>>  Let us know the results. Please provide as much details as possible
>>> (The
>>> more details you provide the easier it is for us to debug your
>>> problem).
>>>
>>> Regards,
>>> Jayesh
>>> ----- Original Message -----
>>> From: huraj at ucm.sk
>>> To: "Jayesh Krishna" <jayesh at mcs.anl.gov>
>>> Cc: "Ladislav Huraj" <ladislav.huraj at ucm.sk>, mpich-discuss at mcs.anl.gov
>>> Sent: Friday, April 9, 2010 2:19:55 PM GMT -06:00 US/Canada Central
>>> Subject: Re: [mpich-discuss] Dual core problem
>>>
>>> Hi,
>>>
>>> cpi does not work (or 1/6 that it works)
>>> "mpiexec -n 3 hello.exe" does not work (the same probability)
>>> "mpiexec -n 1 hostname" works perfectly (it works without problems for
>>> each n), it is the first one without problems
>>>
>>> it seems that the higher number n, the lower probability of correct
>>> output
>>>
>>> Ladislav
>>>
>>>
>>>>  Please provide us more details. Can you run cpi ? Can you run
>>>> "mpiexec
>>>> -n
>>>> 3 hello.exe" ? Can you run "mpiexec -n 1 hostname" ?
>>>>
>>>> -Jayesh
>>>> ----- Original Message -----
>>>> From: "Ladislav Huraj" <ladislav.huraj at ucm.sk>
>>>> To: "Jayesh Krishna" <jayesh at mcs.anl.gov>
>>>> Cc: mpich-discuss at mcs.anl.gov
>>>> Sent: Friday, April 9, 2010 11:41:55 AM GMT -06:00 US/Canada Central
>>>> Subject: Re: [mpich-discuss] Dual core problem
>>>>
>>>> Unfortunately it does not work
>>>>
>>>> BR,
>>>> Ladislav
>>>>
>>>>
>>>> Jayesh Krishna  wrote / napĂ&#65533;amp;#65533;­sal(a):
>>>>
>>>>> Hi,
>>>>>  Can you cpi (c:\program files\mpich2\examples\cpi.exe) on your
>>>>> notebook
>>>>> ?
>>>>>  Does "mpiexec -n 3 hello.exe" work for you (mpiexec launches all
>>>>> procs
>>>>> on the localhost by default)?
>>>>>
>>>>> Regards,
>>>>> Jayesh
>>>>> ----- Original Message -----
>>>>> From: huraj at ucm.sk
>>>>> To: mpich-discuss at mcs.anl.gov
>>>>> Sent: Friday, April 9, 2010 6:24:05 AM GMT -06:00 US/Canada Central
>>>>> Subject: Re: [mpich-discuss] Dual core problem
>>>>>
>>>>> My version is 1.2.1p1. I have already tried the older as well as
>>>>> newer
>>>>> versions of mpich2 but without success.
>>>>>
>>>>> For job I use e.g.:
>>>>> mpiexec &#8211;hosts 3 localhost localhost localhost hello.exe
>>>>> this one works perfectly with the MPI code on my other PC.
>>>>> I tried lots of others options which are correct (I always checked
>>>>> the
>>>>> correctness on my other PC). From this I deducted that problem seems
>>>>> to
>>>>> be
>>>>> in notebook not in the code. Funny is that in one of 4 cases the
>>>>> outputs
>>>>> are correct.
>>>>>
>>>>> My OS is 32bit WinXP Professional SP3, it is notebook HP Intel Core2
>>>>> Duo
>>>>> CP.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> Hi,
>>>>>>  Which version of MPICH2 are you using (If you are using an older
>>>>>> version
>>>>>> try the latest stable version and see if it helps -
>>>>>> http://www.mcs.anl.gov/research/projects/mpich2/downloads/index.php?s=downloads)?
>>>>>>  How are you running your job (mpiexec options)?
>>>>>>  Is your machine 32-bit or 64-bit ?
>>>>>>
>>>>>> Regards,
>>>>>> Jayesh
>>>>>> ----- Original Message -----
>>>>>> From: huraj at ucm.sk
>>>>>> To: mpich-discuss at mcs.anl.gov
>>>>>> Sent: Friday, April 9, 2010 2:40:31 AM GMT -06:00 US/Canada Central
>>>>>> Subject: [mpich-discuss] Dual core problem
>>>>>>
>>>>>> When I run the MPI program the outputs are different. Sometimes the
>>>>>> output
>>>>>> is correct, sometimes I obtain error message and sometimes job
>>>>>> aborting.
>>>>>> See the outputs:
>>>>>>
>>>>>> [01:4088]......ERROR:result command received but the wait_list is
>>>>>> empty.
>>>>>> [01:4088]....ERROR:unable to handle the command: "cmd=result src=1
>>>>>> dest=1
>>>>>> tag=5 cmd_tag=2 cmd_orig=dbput ctx_key=1 result=DBS_SUCCESS "
>>>>>> [01:4088]...ERROR:sock_op_close returned while unknown context is in
>>>>>> state: SMPD_IDLE
>>>>>> [01:2392]......ERROR:result command received but the wait_list is
>>>>>> empty.
>>>>>> [01:2392]....ERROR:unable to handle the command: "cmd=result src=1
>>>>>> dest=1
>>>>>> tag=9 cmd_tag=2 cmd_orig=dbput ctx_key=2 result=DBS_SUCCESS "
>>>>>> [01:2392]...ERROR:sock_op_close returned while unknown context is in
>>>>>> state: SMPD_IDLE
>>>>>>
>>>>>> or
>>>>>> [01:1452]......ERROR:result command received but the wait_list is
>>>>>> empty.
>>>>>> [01:1452]....ERROR:unable to handle the command: "cmd=result src=1
>>>>>> dest=1
>>>>>> tag=4 cmd_tag=1 cmd_orig=dbget ctx_key=0 result=DBS_FAIL "
>>>>>>
>>>>>> job aborted:
>>>>>> rank: node: exit code[: error message]
>>>>>> 0: localhost: 123
>>>>>> 1: localhost: 123
>>>>>> 2: localhost: -1073741819: process 2 exited without calling finalize
>>>>>>
>>>>>> or correctly
>>>>>> Received: Hello, world from process 1!
>>>>>> Received: Hello, world from process 2!
>>>>>> MASTER: All Done!
>>>>>>
>>>>>> The program code is correct; it runs well on different PC.
>>>>>>
>>>>>> I am afraid the problem is in my notebook. The notebook is Intel
>>>>>> Core2
>>>>>> Duo
>>>>>> CPU.
>>>>>> I tried to change the setting &#8216;hosts&#8217; in wmpiconfig (to
>>>>>> localhost:2) for dual core but nothing has changed.
>>>>>> I need it only for local using.
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> Ladislav
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> __________ Information from ESET Mail Security, version of virus
>>>>>> signature
>>>>>> database 5012 (20100409) __________
>>>>>>
>>>>>> The message was checked by ESET Mail Security.
>>>>>> http://www.eset.com
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> mpich-discuss mailing list
>>>>>> mpich-discuss at mcs.anl.gov
>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>> _______________________________________________
>>>>>> mpich-discuss mailing list
>>>>>> mpich-discuss at mcs.anl.gov
>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>
>>>>>>
>>>>>> __________ Information from ESET Mail Security, version of virus
>>>>>> signature
>>>>>> database 5012 (20100409) __________
>>>>>>
>>>>>> The message was checked by ESET Mail Security.
>>>>>> http://www.eset.com
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> __________ Information from ESET Mail Security, version of virus
>>>>> signature database 5012 (20100409) __________
>>>>>
>>>>> The message was checked by ESET Mail Security.
>>>>> http://www.eset.com
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> mpich-discuss mailing list
>>>>> mpich-discuss at mcs.anl.gov
>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>
>>>>>
>>>>> __________ Information from ESET Mail Security, version of virus
>>>>> signature database 5013 (20100409) __________
>>>>>
>>>>> The message was checked by ESET Mail Security.
>>>>> http://www.eset.com
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>> __________ Information from ESET Mail Security, version of virus
>>>> signature
>>>> database 5013 (20100409) __________
>>>>
>>>> The message was checked by ESET Mail Security.
>>>> http://www.eset.com
>>>>
>>>>
>>>>
>>>>
>>>> __________ Information from ESET Mail Security, version of virus
>>>> signature
>>>> database 5014 (20100409) __________
>>>>
>>>> The message was checked by ESET Mail Security.
>>>> http://www.eset.com
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>> __________ Information from ESET Mail Security, version of virus
>>> signature
>>> database 5014 (20100409) __________
>>>
>>> The message was checked by ESET Mail Security.
>>> http://www.eset.com
>>>
>>>
>>>
>>>
>>> __________ Information from ESET Mail Security, version of virus
>>> signature
>>> database 5014 (20100409) __________
>>>
>>> The message was checked by ESET Mail Security.
>>> http://www.eset.com
>>>
>>>
>>>
>>>
>>
>>
>>
>>
>> __________ Information from ESET Mail Security, version of virus
>> signature database 5016 (20100410) __________
>>
>> The message was checked by ESET Mail Security.
>> http://www.eset.com
>>
>>
>>
>>
>> __________ Information from ESET Mail Security, version of virus
>> signature database 5016 (20100410) __________
>>
>> The message was checked by ESET Mail Security.
>> http://www.eset.com
>>
>>
>>
>>
>
>
>
> __________ Information from ESET Mail Security, version of virus signature
> database 5019 (20100412) __________
>
> The message was checked by ESET Mail Security.
> http://www.eset.com
>
>
>
>
> __________ Information from ESET Mail Security, version of virus signature
> database 5021 (20100412) __________
>
> The message was checked by ESET Mail Security.
> http://www.eset.com
>
>
>



__________ Information from ESET Mail Security, version of virus signature database 5026 (20100413) __________

The message was checked by ESET Mail Security.
http://www.eset.com

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smpd.log
Type: application/octet-stream
Size: 1167443 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20100414/98dc447d/attachment-0003.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mpiexec.log
Type: application/octet-stream
Size: 46090 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20100414/98dc447d/attachment-0004.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: hello.log
Type: application/octet-stream
Size: 143779 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20100414/98dc447d/attachment-0005.obj>


More information about the mpich-discuss mailing list