[mpich-discuss] Fwd: Possible setup problem

Darius Buntinas buntinas at mcs.anl.gov
Wed Apr 27 15:50:39 CDT 2011


Forwarding to mpich-discuss in case others are interested.

-d


Begin forwarded message:

> From: Darius Buntinas <buntinas at mcs.anl.gov>
> Date: April 27, 2011 1:58:46 PM CDT
> To: Andy_Holland at URSCorp.com
> Subject: Re: [mpich-discuss] Possible setup problem
> 
> 
> I think I found the problem.  I should have checked this earlier.  It looks like your machines are set up to return 127.0.0.1 (the loopback address) when resolving their own hostname, rather than their actual IP address.
> 
> Try this on s051rhlapp01:
>  hostname
> It should return s051rhlapp01.  Then try:
>  host s051rhlapp01
> It should NOT return 127.0.0.1.  Then try the same thing on s051rhlapp01 (using it's own name).
> 
> If you don't get what you should, it indicates a problem with your network configuration.
> 
> -d
> 
> On Apr 26, 2011, at 5:04 PM, Andy_Holland at URSCorp.com wrote:
> 
>> 
>> Here ya go. 
>> 
>> 
>> 
>> Andy Holland
>> Air Quality Modeler
>> URS Corporation
>> 1600 Perimeter Park Drive
>> Suite 400
>> Morrisville, NC 27560
>> Direct: (303) 796-4694
>> Cell: (919) 619-4218
>> Fax: (919) 461-1415
>> andy_holland at urscorp.com 
>> 
>> This e-mail and any attachments contain URS Corporation confidential information that may be proprietary or privileged. If you receive this message in error or are not the intended recipient, you should not retain, distribute, disclose or use any of this information and you should destroy the e-mail and any attachments or copies.
>> 
>> 
>> 
>> 
>> 
>> 
>> Darius Buntinas <buntinas at mcs.anl.gov>
>> 04/26/2011 05:56 PM 
>> 
>> To
>> Andy_Holland at URSCorp.com
>> cc
>> Subject
>> Re: [mpich-discuss] Possible setup problem
>> 
>> 
>> 
>> 
>> 
>> Oops I forgot to mention that you need to recompile the simple_test file:
>> 
>> mpicc simple_test.c -o simple_test
>> 
>> Can you try it again?
>> 
>> Thanks,
>> -d
>> 
>> On Apr 26, 2011, at 3:45 PM, Andy_Holland at URSCorp.com wrote:
>> 
>>> 
>>> Ok, I applied the patch and rebuilt both installations and reran your test program.  Attached is the log file. 
>>> 
>>> 
>>> 
>>> Thank you, 
>>> 
>>> Andy Holland
>>> Air Quality Modeler
>>> URS Corporation
>>> 1600 Perimeter Park Drive
>>> Suite 400
>>> Morrisville, NC 27560
>>> Direct: (303) 796-4694
>>> Cell: (919) 619-4218
>>> Fax: (919) 461-1415
>>> andy_holland at urscorp.com 
>>> 
>>> This e-mail and any attachments contain URS Corporation confidential information that may be proprietary or privileged. If you receive this message in error or are not the intended recipient, you should not retain, distribute, disclose or use any of this information and you should destroy the e-mail and any attachments or copies.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> Darius Buntinas <buntinas at mcs.anl.gov>
>>> 04/26/2011 02:20 PM 
>>> 
>>> To
>>> Andy_Holland at URSCorp.com
>>> cc
>>> Subject
>>> Re: [mpich-discuss] Possible setup problem
>>> 
>>> 
>>> 
>>> 
>>> 
>>> Hmm.  I found a bug with error reporting.  While this won't directly fix your problem, it may help with identifying it.
>>> 
>>> Can you apply this patch, then rebuild and re-install mpich2 on both machines?
>>> 
>>>   (from the mpich2 source directory)
>>>   patch -p0 < errno.patch
>>>   make clean
>>>   make
>>>   make install
>>> 
>>> Then try the simple_test.c again and send us the log.
>>> 
>>> Thanks,
>>> -d
>>> 
>>> [attachment "errno.patch" deleted by Andy Holland/Denver/URSCorp] 
>>> 
>>> On Apr 26, 2011, at 11:28 AM, Andy_Holland at URSCorp.com wrote:
>>> 
>>>> 
>>>> Ok, I turned iptables off on both machines and reran it.  Attached is the log file. 
>>>> 
>>>> 
>>>> 
>>>> Andy Holland
>>>> Air Quality Modeler
>>>> URS Corporation
>>>> 1600 Perimeter Park Drive
>>>> Suite 400
>>>> Morrisville, NC 27560
>>>> Direct: (303) 796-4694
>>>> Cell: (919) 619-4218
>>>> Fax: (919) 461-1415
>>>> andy_holland at urscorp.com 
>>>> 
>>>> This e-mail and any attachments contain URS Corporation confidential information that may be proprietary or privileged. If you receive this message in error or are not the intended recipient, you should not retain, distribute, disclose or use any of this information and you should destroy the e-mail and any attachments or copies.
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> Darius Buntinas <buntinas at mcs.anl.gov> 
>>>> Sent by: mpich-discuss-bounces at mcs.anl.gov
>>>> 04/26/2011 11:13 AM
>>>> Please respond to
>>>> mpich-discuss at mcs.anl.gov
>>>> 
>>>> 
>>>> To
>>>> mpich-discuss at mcs.anl.gov
>>>> cc
>>>> Subject
>>>> Re: [mpich-discuss] Possible setup problem
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> For some reason, it's not showing the specific socket error, but it's happening when a process on s051rhlapp02 tries to send a message to a process on s051rhlapp01.  Can you try disabling the firewalls on the machines and try it again?
>>>> 
>>>> Thanks,
>>>> -d
>>>> 
>>>> On Apr 25, 2011, at 5:39 PM, Andy_Holland at URSCorp.com wrote:
>>>> 
>>>>> 
>>>>> Yeah, I put it in the wrong directory.  Ok, I reran in a shared area and I've attached the log file. 
>>>>> 
>>>>> 
>>>>> 
>>>>> Thanks, 
>>>>> 
>>>>> Andy Holland
>>>>> Air Quality Modeler
>>>>> URS Corporation
>>>>> 1600 Perimeter Park Drive
>>>>> Suite 400
>>>>> Morrisville, NC 27560
>>>>> Direct: (303) 796-4694
>>>>> Cell: (919) 619-4218
>>>>> Fax: (919) 461-1415
>>>>> andy_holland at urscorp.com 
>>>>> 
>>>>> This e-mail and any attachments contain URS Corporation confidential information that may be proprietary or privileged. If you receive this message in error or are not the intended recipient, you should not retain, distribute, disclose or use any of this information and you should destroy the e-mail and any attachments or copies.
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> Darius Buntinas <buntinas at mcs.anl.gov> 
>>>>> Sent by: mpich-discuss-bounces at mcs.anl.gov
>>>>> 04/25/2011 05:45 PM
>>>>> Please respond to
>>>>> mpich-discuss at mcs.anl.gov
>>>>> 
>>>>> 
>>>>> To
>>>>> mpich-discuss at mcs.anl.gov
>>>>> cc
>>>>> Subject
>>>>> Re: [mpich-discuss] Possible setup problem
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> Andy,
>>>>> 
>>>>> Looking through the log file, I see a line that says:
>>>>> 
>>>>> [proxy:0:1 at s051rhlapp02] launch_procs (/usr/local/mpich2-1.3.2p1/src/pm/hydra/pm/pmiserv/pmip_cb.c:639): unable to change wdir to /home/andy_holland/mpich_test (No such file or directory)
>>>>> 
>>>>> Can you check that you can access /home/andy_holland/mpich_test from s051rhlapp02 ?
>>>>> 
>>>>> If not, put simple_test into a directory that's accessible from both machines, and try it again.
>>>>> 
>>>>> Thanks,
>>>>> -d
>>>>> 
>>>>> On Apr 25, 2011, at 3:55 PM, Andy_Holland at URSCorp.com wrote:
>>>>> 
>>>>>> 
>>>>>> Daruis, 
>>>>>>        Thanks.  If I had just thought for a second longer I would have had it.  Attached is the log file for your test program. 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Andy Holland
>>>>>> Air Quality Modeler
>>>>>> URS Corporation
>>>>>> 1600 Perimeter Park Drive
>>>>>> Suite 400
>>>>>> Morrisville, NC 27560
>>>>>> Direct: (303) 796-4694
>>>>>> Cell: (919) 619-4218
>>>>>> Fax: (919) 461-1415
>>>>>> andy_holland at urscorp.com 
>>>>>> 
>>>>>> This e-mail and any attachments contain URS Corporation confidential information that may be proprietary or privileged. If you receive this message in error or are not the intended recipient, you should not retain, distribute, disclose or use any of this information and you should destroy the e-mail and any attachments or copies.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Darius Buntinas <buntinas at mcs.anl.gov> 
>>>>>> Sent by: mpich-discuss-bounces at mcs.anl.gov
>>>>>> 04/25/2011 04:32 PM
>>>>>> Please respond to
>>>>>> mpich-discuss at mcs.anl.gov
>>>>>> 
>>>>>> 
>>>>>> To
>>>>>> mpich-discuss at mcs.anl.gov
>>>>>> cc
>>>>>> Subject
>>>>>> Re: [mpich-discuss] Possible setup problem
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Sorry.  Just run:
>>>>>>   mpicc simple_test.c -o simple_test
>>>>>> 
>>>>>> If you needed to specify the full path for mpiexec, use the same path for mpicc.  This will generate the executable called simple_test.
>>>>>> 
>>>>>> -d
>>>>>> 
>>>>>> 
>>>>>> On Apr 25, 2011, at 3:26 PM, Andy_Holland at URSCorp.com wrote:
>>>>>> 
>>>>>>> 
>>>>>>> Darius, 
>>>>>>>        Thanks for your help with this.  You'll have to forgive me though, I'm a Fortran programmer and I'm not exactly sure how to compile the program you sent me.  I have gcc by the way. 
>>>>>>> 
>>>>>>> Thanks, 
>>>>>>> 
>>>>>>> Andy Holland
>>>>>>> Air Quality Modeler
>>>>>>> URS Corporation
>>>>>>> 1600 Perimeter Park Drive
>>>>>>> Suite 400
>>>>>>> Morrisville, NC 27560
>>>>>>> Direct: (303) 796-4694
>>>>>>> Cell: (919) 619-4218
>>>>>>> Fax: (919) 461-1415
>>>>>>> andy_holland at urscorp.com 
>>>>>>> 
>>>>>>> This e-mail and any attachments contain URS Corporation confidential information that may be proprietary or privileged. If you receive this message in error or are not the intended recipient, you should not retain, distribute, disclose or use any of this information and you should destroy the e-mail and any attachments or copies.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> Darius Buntinas <buntinas at mcs.anl.gov> 
>>>>>>> Sent by: mpich-discuss-bounces at mcs.anl.gov
>>>>>>> 04/25/2011 03:19 PM
>>>>>>> Please respond to
>>>>>>> mpich-discuss at mcs.anl.gov
>>>>>>> 
>>>>>>> 
>>>>>>> To
>>>>>>> mpich-discuss at mcs.anl.gov
>>>>>>> cc
>>>>>>> Subject
>>>>>>> Re: [mpich-discuss] Possible setup problem
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> OK, can you try the attached test program with the same number of processes and machine file, but also add the -l option to mpiexec (to label the lines of output with the rank).
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> -d
>>>>>>> 
>>>>>>> 
>>>>>>> [attachment "simple_test.c" deleted by Andy Holland/Denver/URSCorp] 
>>>>>>> On Apr 25, 2011, at 2:00 PM, Andy_Holland at URSCorp.com wrote:
>>>>>>> 
>>>>>>>> 
>>>>>>>> I've attached the log for running cpi using the same machinefile. 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Thank you, 
>>>>>>>> 
>>>>>>>> Andy Holland
>>>>>>>> Air Quality Modeler
>>>>>>>> URS Corporation
>>>>>>>> 1600 Perimeter Park Drive
>>>>>>>> Suite 400
>>>>>>>> Morrisville, NC 27560
>>>>>>>> Direct: (303) 796-4694
>>>>>>>> Cell: (919) 619-4218
>>>>>>>> Fax: (919) 461-1415
>>>>>>>> andy_holland at urscorp.com 
>>>>>>>> 
>>>>>>>> This e-mail and any attachments contain URS Corporation confidential information that may be proprietary or privileged. If you receive this message in error or are not the intended recipient, you should not retain, distribute, disclose or use any of this information and you should destroy the e-mail and any attachments or copies.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Darius Buntinas <buntinas at mcs.anl.gov> 
>>>>>>>> Sent by: mpich-discuss-bounces at mcs.anl.gov
>>>>>>>> 04/25/2011 02:51 PM
>>>>>>>> Please respond to
>>>>>>>> mpich-discuss at mcs.anl.gov
>>>>>>>> 
>>>>>>>> 
>>>>>>>> To
>>>>>>>> mpich-discuss at mcs.anl.gov
>>>>>>>> cc
>>>>>>>> Subject
>>>>>>>> Re: [mpich-discuss] Possible setup problem
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Hi Andy,
>>>>>>>> 
>>>>>>>> Can you try running cpi from the examples directory of the MPICH2 source tree with the same number of processes and the same machine file?  Let us know if that works, and, if not, send us the output, please.
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> -d
>>>>>>>> 
>>>>>>>> On Apr 25, 2011, at 1:30 PM, Andy_Holland at URSCorp.com wrote:
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> It was suggested that I send out all the error messages.  I've attached a log file from the model. 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Thank you, 
>>>>>>>>> 
>>>>>>>>> Andy Holland
>>>>>>>>> Air Quality Modeler
>>>>>>>>> URS Corporation
>>>>>>>>> 1600 Perimeter Park Drive
>>>>>>>>> Suite 400
>>>>>>>>> Morrisville, NC 27560
>>>>>>>>> Direct: (303) 796-4694
>>>>>>>>> Cell: (919) 619-4218
>>>>>>>>> Fax: (919) 461-1415
>>>>>>>>> andy_holland at urscorp.com 
>>>>>>>>> 
>>>>>>>>> This e-mail and any attachments contain URS Corporation confidential information that may be proprietary or privileged. If you receive this message in error or are not the intended recipient, you should not retain, distribute, disclose or use any of this information and you should destroy the e-mail and any attachments or copies.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Dave Goodell <goodell at mcs.anl.gov> 
>>>>>>>>> Sent by: mpich-discuss-bounces at mcs.anl.gov
>>>>>>>>> 04/25/2011 02:22 PM
>>>>>>>>> Please respond to
>>>>>>>>> mpich-discuss at mcs.anl.gov
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> To
>>>>>>>>> mpich-discuss at mcs.anl.gov
>>>>>>>>> cc
>>>>>>>>> Subject
>>>>>>>>> Re: [mpich-discuss] Possible setup problem
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Apr 25, 2011, at 12:59 PM CDT, Andy_Holland at URSCorp.com wrote:
>>>>>>>>> 
>>>>>>>>>> When I run from either machine using CPUs from both machines the run stops with many mpi messages.  Below is the last message in the list: 
>>>>>>>>>> 
>>>>>>>>>> main (/usr/local/mpich2-1.3.2p1/src/pm/hydra/ui/mpich/mpiexec.c:404): process manager error waiting for completion 
>>>>>>>>> 
>>>>>>>>> Can you send us all of the error messages?  Typically the first error messages are the most useful/relevant; the last ones often are just messages announcing some sort of cleanup or secondary error caused by the original error.
>>>>>>>>> 
>>>>>>>>> -Dave
>>>>>>>>> 
>>>>>>>>> _______________________________________________
>>>>>>>>> mpich-discuss mailing list
>>>>>>>>> mpich-discuss at mcs.anl.gov
>>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>>>> 
>>>>>>>>> <run.cctm.parallel.txt>_______________________________________________
>>>>>>>>> mpich-discuss mailing list
>>>>>>>>> mpich-discuss at mcs.anl.gov
>>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>>> 
>>>>>>>> _______________________________________________
>>>>>>>> mpich-discuss mailing list
>>>>>>>> mpich-discuss at mcs.anl.gov
>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>>> 
>>>>>>>> <cpi_log.txt>_______________________________________________
>>>>>>>> mpich-discuss mailing list
>>>>>>>> mpich-discuss at mcs.anl.gov
>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> mpich-discuss mailing list
>>>>>>> mpich-discuss at mcs.anl.gov
>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> mpich-discuss mailing list
>>>>>>> mpich-discuss at mcs.anl.gov
>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>> 
>>>>>> _______________________________________________
>>>>>> mpich-discuss mailing list
>>>>>> mpich-discuss at mcs.anl.gov
>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>> 
>>>>>> <simple_test_log.txt>_______________________________________________
>>>>>> mpich-discuss mailing list
>>>>>> mpich-discuss at mcs.anl.gov
>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>> 
>>>>> _______________________________________________
>>>>> mpich-discuss mailing list
>>>>> mpich-discuss at mcs.anl.gov
>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>> 
>>>>> <simple_test_log.txt>_______________________________________________
>>>>> mpich-discuss mailing list
>>>>> mpich-discuss at mcs.anl.gov
>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>> 
>>>> _______________________________________________
>>>> mpich-discuss mailing list
>>>> mpich-discuss at mcs.anl.gov
>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>> 
>>>> <simple_test_log.txt>
>>> 
>>> 
>>> <simple_test_log.txt>
>> 
>> 
>> <simple_test_log.txt>
> 



More information about the mpich-discuss mailing list