[mpich-discuss] Possible setup problem

Andy_Holland at URSCorp.com Andy_Holland at URSCorp.com
Fri Apr 29 10:08:31 CDT 2011


Darius,
        There is quite a bit of output from the program.  When I pipe the 
standard output the actual program never starts, MPICH messages just fill 
the screen and keep going and going.  It does work just fine if I redirect 
the standard output to a file.

Andy Holland
Air Quality Modeler
URS Corporation
1600 Perimeter Park Drive
Suite 400
Morrisville, NC 27560
Direct: (303) 796-4694
Cell: (919) 619-4218
Fax: (919) 461-1415
andy_holland at urscorp.com


This e-mail and any attachments contain URS Corporation confidential 
information that may be proprietary or privileged. If you receive this 
message in error or are not the intended recipient, you should not retain, 
distribute, disclose or use any of this information and you should destroy 
the e-mail and any attachments or copies.






Darius Buntinas <buntinas at mcs.anl.gov> 
Sent by: mpich-discuss-bounces at mcs.anl.gov
04/29/2011 11:04 AM
Please respond to
mpich-discuss at mcs.anl.gov



To
Andy_Holland at URSCorp.com, mpich-discuss at mcs.anl.gov
cc

Subject
Re: [mpich-discuss] Possible setup problem






[Re-adding mpich-discuss]

Is there a lot of output (e.g., a few pages, or a few MBs)?  The process 
manager is not designed to handle a lot of stdin/out traffic.  If you have 
a lot of data it's better to write it directly to a file.

I think you said this was a fortran program.  I know there is some 
trickiness with buffering I/O in fortran.  How do you know the program is 
hanging?  Does the program not finish in the expected time, or do you just 
not see any output in the redirected file when you expect it.  If it's the 
latter, it could be that the output is being buffered in which case you 
might have to wait until the program terminates before you see the output.

-d

On Apr 29, 2011, at 9:52 AM, Andy_Holland at URSCorp.com wrote:

> 
> The problem only occurs when I pipe the screen output to a log file.  If 
I don't do that, it runs fine. 
> 
> Andy Holland
> Air Quality Modeler
> URS Corporation
> 1600 Perimeter Park Drive
> Suite 400
> Morrisville, NC 27560
> Direct: (303) 796-4694
> Cell: (919) 619-4218
> Fax: (919) 461-1415
> andy_holland at urscorp.com 
> 
> This e-mail and any attachments contain URS Corporation confidential 
information that may be proprietary or privileged. If you receive this 
message in error or are not the intended recipient, you should not retain, 
distribute, disclose or use any of this information and you should destroy 
the e-mail and any attachments or copies.
> 
> 
> 
> 
> 
> 
> Darius Buntinas <buntinas at mcs.anl.gov>
> 04/28/2011 05:15 PM 
> 
> To
> Andy_Holland at URSCorp.com
> cc
> Subject
> Re: [mpich-discuss] Possible setup problem
> 
> 
> 
> 
> 
> 
> It looks like the test program worked. 
> 
> Check whether your app works on one node.  Also try other applications 
to see if they work over two nodes.
> 
> -d
> 
> On Apr 28, 2011, at 3:38 PM, Andy_Holland at URSCorp.com wrote:
> 
> > 
> > We have modified some files on the machines and now when I do 'host 
s051rhlapp01' it gives me the actual IP address of the machine.  I've 
attached the log file for your simple test after this correction.  I think 
it completed successfully, but wanted to check with you. 
> > 
> > The model I'm trying to run using MPICH starts off fine now, but then 
hangs at a certain point, not sure if this there is still a problem or 
not. 
> > 
> > 
> > 
> > Thanks, 
> > 
> > Andy Holland
> > Air Quality Modeler
> > URS Corporation
> > 1600 Perimeter Park Drive
> > Suite 400
> > Morrisville, NC 27560
> > Direct: (303) 796-4694
> > Cell: (919) 619-4218
> > Fax: (919) 461-1415
> > andy_holland at urscorp.com 
> > 
> > This e-mail and any attachments contain URS Corporation confidential 
information that may be proprietary or privileged. If you receive this 
message in error or are not the intended recipient, you should not retain, 
distribute, disclose or use any of this information and you should destroy 
the e-mail and any attachments or copies.
> > 
> > 
> > 
> > 
> > 
> > 
> > Darius Buntinas <buntinas at mcs.anl.gov>
> > 04/27/2011 05:13 PM 
> > 
> > To
> > Andy_Holland at URSCorp.com
> > cc
> > Subject
> > Re: [mpich-discuss] Possible setup problem
> > 
> > 
> > 
> > 
> > 
> > 
> > The problem is that machine A is unable to determine what it's IP 
address is from it's hostname.  So if you do a
> >    hostname
> > from machine A, it should return A (or A.foo.com).  Then you should be 
able to do
> >    host A 
> > (or "host A.foo.com") and get the IP address of the machine.  It looks 
like your machines are returning the loopback address.  It's possible that 
you just need to make sure that the /etc/hosts file on each machine has 
_its_own_ name in there (the one returned by hostname) and that its set to 
the machine's actual IP address (and not 127.0.0.1).
> > 
> > I'm not an expert in configuring networks, so I can't really be more 
specific.  Sorry.
> > 
> > -d 
> > 
> > On Apr 27, 2011, at 4:06 PM, Andy_Holland at URSCorp.com wrote:
> > 
> > > 
> > > The /etc/hosts file only has the short names in it.  I'm not exactly 
sure what the networking issue is that I need to let the sysadmin know 
about.  Can you please explain it to me? 
> > > 
> > > Thanks, 
> > > 
> > > Andy Holland
> > > Air Quality Modeler
> > > URS Corporation
> > > 1600 Perimeter Park Drive
> > > Suite 400
> > > Morrisville, NC 27560
> > > Direct: (303) 796-4694
> > > Cell: (919) 619-4218
> > > Fax: (919) 461-1415
> > > andy_holland at urscorp.com 
> > > 
> > > This e-mail and any attachments contain URS Corporation confidential 
information that may be proprietary or privileged. If you receive this 
message in error or are not the intended recipient, you should not retain, 
distribute, disclose or use any of this information and you should destroy 
the e-mail and any attachments or copies.
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > Darius Buntinas <buntinas at mcs.anl.gov>
> > > 04/27/2011 04:53 PM 
> > > 
> > > To
> > > Andy_Holland at URSCorp.com
> > > cc
> > > Subject
> > > Re: [mpich-discuss] Possible setup problem
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > How are the machines getting the IP address when using the fill 
name?  If they're in /etc/hosts, then I would go ahead and add the short 
names there.  Otherwise, while adding the short names there will work, 
there's another network configuration problem that's causing this and may 
give you trouble in the future, so it might be worth it to find a sysadmin 
to help you (I'm lucky enough to have great sysadmins here, so I don't 
(have to) know too much about configuring networking.).
> > > 
> > > -d
> > > 
> > > On Apr 27, 2011, at 3:46 PM, Andy_Holland at URSCorp.com wrote:
> > > 
> > > > 
> > > > I just tried doing the host command with the full name of the 
machine including the domain and it is returning the correct IP address 
for each machine.  The /etc/hosts files on the machines do not include the 
domain in the machine name.  Maybe they should? 
> > > > 
> > > > Andy Holland
> > > > Air Quality Modeler
> > > > URS Corporation
> > > > 1600 Perimeter Park Drive
> > > > Suite 400
> > > > Morrisville, NC 27560
> > > > Direct: (303) 796-4694
> > > > Cell: (919) 619-4218
> > > > Fax: (919) 461-1415
> > > > andy_holland at urscorp.com 
> > > > 
> > > > This e-mail and any attachments contain URS Corporation 
confidential information that may be proprietary or privileged. If you 
receive this message in error or are not the intended recipient, you 
should not retain, distribute, disclose or use any of this information and 
you should destroy the e-mail and any attachments or copies.
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > Darius Buntinas <buntinas at mcs.anl.gov>
> > > > 04/27/2011 02:58 PM 
> > > > 
> > > > To
> > > > Andy_Holland at URSCorp.com
> > > > cc
> > > > Subject
> > > > Re: [mpich-discuss] Possible setup problem
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > I think I found the problem.  I should have checked this earlier. 
It looks like your machines are set up to return 127.0.0.1 (the loopback 
address) when resolving their own hostname, rather than their actual IP 
address.
> > > > 
> > > > Try this on s051rhlapp01:
> > > >  hostname
> > > > It should return s051rhlapp01.  Then try:
> > > >  host s051rhlapp01
> > > > It should NOT return 127.0.0.1.  Then try the same thing on 
s051rhlapp01 (using it's own name).
> > > > 
> > > > If you don't get what you should, it indicates a problem with your 
network configuration.
> > > > 
> > > > -d
> > > > 
> > > > On Apr 26, 2011, at 5:04 PM, Andy_Holland at URSCorp.com wrote:
> > > > 
> > > > > 
> > > > > Here ya go. 
> > > > > 
> > > > > 
> > > > > 
> > > > > Andy Holland
> > > > > Air Quality Modeler
> > > > > URS Corporation
> > > > > 1600 Perimeter Park Drive
> > > > > Suite 400
> > > > > Morrisville, NC 27560
> > > > > Direct: (303) 796-4694
> > > > > Cell: (919) 619-4218
> > > > > Fax: (919) 461-1415
> > > > > andy_holland at urscorp.com 
> > > > > 
> > > > > This e-mail and any attachments contain URS Corporation 
confidential information that may be proprietary or privileged. If you 
receive this message in error or are not the intended recipient, you 
should not retain, distribute, disclose or use any of this information and 
you should destroy the e-mail and any attachments or copies.
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > Darius Buntinas <buntinas at mcs.anl.gov>
> > > > > 04/26/2011 05:56 PM 
> > > > > 
> > > > > To
> > > > > Andy_Holland at URSCorp.com
> > > > > cc
> > > > > Subject
> > > > > Re: [mpich-discuss] Possible setup problem
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > Oops I forgot to mention that you need to recompile the 
simple_test file:
> > > > > 
> > > > >  mpicc simple_test.c -o simple_test
> > > > > 
> > > > > Can you try it again?
> > > > > 
> > > > > Thanks,
> > > > > -d
> > > > > 
> > > > > On Apr 26, 2011, at 3:45 PM, Andy_Holland at URSCorp.com wrote:
> > > > > 
> > > > > > 
> > > > > > Ok, I applied the patch and rebuilt both installations and 
reran your test program.  Attached is the log file. 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > Thank you, 
> > > > > > 
> > > > > > Andy Holland
> > > > > > Air Quality Modeler
> > > > > > URS Corporation
> > > > > > 1600 Perimeter Park Drive
> > > > > > Suite 400
> > > > > > Morrisville, NC 27560
> > > > > > Direct: (303) 796-4694
> > > > > > Cell: (919) 619-4218
> > > > > > Fax: (919) 461-1415
> > > > > > andy_holland at urscorp.com 
> > > > > > 
> > > > > > This e-mail and any attachments contain URS Corporation 
confidential information that may be proprietary or privileged. If you 
receive this message in error or are not the intended recipient, you 
should not retain, distribute, disclose or use any of this information and 
you should destroy the e-mail and any attachments or copies.
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > Darius Buntinas <buntinas at mcs.anl.gov>
> > > > > > 04/26/2011 02:20 PM 
> > > > > > 
> > > > > > To
> > > > > > Andy_Holland at URSCorp.com
> > > > > > cc
> > > > > > Subject
> > > > > > Re: [mpich-discuss] Possible setup problem
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > Hmm.  I found a bug with error reporting.  While this won't 
directly fix your problem, it may help with identifying it.
> > > > > > 
> > > > > > Can you apply this patch, then rebuild and re-install mpich2 
on both machines?
> > > > > > 
> > > > > >    (from the mpich2 source directory)
> > > > > >    patch -p0 < errno.patch
> > > > > >    make clean
> > > > > >    make
> > > > > >    make install
> > > > > > 
> > > > > > Then try the simple_test.c again and send us the log.
> > > > > > 
> > > > > > Thanks,
> > > > > > -d
> > > > > > 
> > > > > > [attachment "errno.patch" deleted by Andy 
Holland/Denver/URSCorp] 
> > > > > > 
> > > > > > On Apr 26, 2011, at 11:28 AM, Andy_Holland at URSCorp.com wrote:
> > > > > > 
> > > > > > > 
> > > > > > > Ok, I turned iptables off on both machines and reran it. 
Attached is the log file. 
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > Andy Holland
> > > > > > > Air Quality Modeler
> > > > > > > URS Corporation
> > > > > > > 1600 Perimeter Park Drive
> > > > > > > Suite 400
> > > > > > > Morrisville, NC 27560
> > > > > > > Direct: (303) 796-4694
> > > > > > > Cell: (919) 619-4218
> > > > > > > Fax: (919) 461-1415
> > > > > > > andy_holland at urscorp.com 
> > > > > > > 
> > > > > > > This e-mail and any attachments contain URS Corporation 
confidential information that may be proprietary or privileged. If you 
receive this message in error or are not the intended recipient, you 
should not retain, distribute, disclose or use any of this information and 
you should destroy the e-mail and any attachments or copies.
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > Darius Buntinas <buntinas at mcs.anl.gov> 
> > > > > > > Sent by: mpich-discuss-bounces at mcs.anl.gov
> > > > > > > 04/26/2011 11:13 AM
> > > > > > > Please respond to
> > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > 
> > > > > > > 
> > > > > > > To
> > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > cc
> > > > > > > Subject
> > > > > > > Re: [mpich-discuss] Possible setup problem
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > For some reason, it's not showing the specific socket error, 
but it's happening when a process on s051rhlapp02 tries to send a message 
to a process on s051rhlapp01.  Can you try disabling the firewalls on the 
machines and try it again?
> > > > > > > 
> > > > > > > Thanks,
> > > > > > > -d
> > > > > > > 
> > > > > > > On Apr 25, 2011, at 5:39 PM, Andy_Holland at URSCorp.com wrote:
> > > > > > > 
> > > > > > > > 
> > > > > > > > Yeah, I put it in the wrong directory.  Ok, I reran in a 
shared area and I've attached the log file. 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > Thanks, 
> > > > > > > > 
> > > > > > > > Andy Holland
> > > > > > > > Air Quality Modeler
> > > > > > > > URS Corporation
> > > > > > > > 1600 Perimeter Park Drive
> > > > > > > > Suite 400
> > > > > > > > Morrisville, NC 27560
> > > > > > > > Direct: (303) 796-4694
> > > > > > > > Cell: (919) 619-4218
> > > > > > > > Fax: (919) 461-1415
> > > > > > > > andy_holland at urscorp.com 
> > > > > > > > 
> > > > > > > > This e-mail and any attachments contain URS Corporation 
confidential information that may be proprietary or privileged. If you 
receive this message in error or are not the intended recipient, you 
should not retain, distribute, disclose or use any of this information and 
you should destroy the e-mail and any attachments or copies.
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > Darius Buntinas <buntinas at mcs.anl.gov> 
> > > > > > > > Sent by: mpich-discuss-bounces at mcs.anl.gov
> > > > > > > > 04/25/2011 05:45 PM
> > > > > > > > Please respond to
> > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > 
> > > > > > > > 
> > > > > > > > To
> > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > cc
> > > > > > > > Subject
> > > > > > > > Re: [mpich-discuss] Possible setup problem
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > Andy,
> > > > > > > > 
> > > > > > > > Looking through the log file, I see a line that says:
> > > > > > > > 
> > > > > > > > [proxy:0:1 at s051rhlapp02] launch_procs 
(/usr/local/mpich2-1.3.2p1/src/pm/hydra/pm/pmiserv/pmip_cb.c:639): unable 
to change wdir to /home/andy_holland/mpich_test (No such file or 
directory)
> > > > > > > > 
> > > > > > > > Can you check that you can access 
/home/andy_holland/mpich_test from s051rhlapp02 ?
> > > > > > > > 
> > > > > > > > If not, put simple_test into a directory that's accessible 
from both machines, and try it again.
> > > > > > > > 
> > > > > > > > Thanks,
> > > > > > > > -d
> > > > > > > > 
> > > > > > > > On Apr 25, 2011, at 3:55 PM, Andy_Holland at URSCorp.com 
wrote:
> > > > > > > > 
> > > > > > > > > 
> > > > > > > > > Daruis, 
> > > > > > > > >         Thanks.  If I had just thought for a second 
longer I would have had it.  Attached is the log file for your test 
program. 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > Andy Holland
> > > > > > > > > Air Quality Modeler
> > > > > > > > > URS Corporation
> > > > > > > > > 1600 Perimeter Park Drive
> > > > > > > > > Suite 400
> > > > > > > > > Morrisville, NC 27560
> > > > > > > > > Direct: (303) 796-4694
> > > > > > > > > Cell: (919) 619-4218
> > > > > > > > > Fax: (919) 461-1415
> > > > > > > > > andy_holland at urscorp.com 
> > > > > > > > > 
> > > > > > > > > This e-mail and any attachments contain URS Corporation 
confidential information that may be proprietary or privileged. If you 
receive this message in error or are not the intended recipient, you 
should not retain, distribute, disclose or use any of this information and 
you should destroy the e-mail and any attachments or copies.
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > Darius Buntinas <buntinas at mcs.anl.gov> 
> > > > > > > > > Sent by: mpich-discuss-bounces at mcs.anl.gov
> > > > > > > > > 04/25/2011 04:32 PM
> > > > > > > > > Please respond to
> > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > To
> > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > cc
> > > > > > > > > Subject
> > > > > > > > > Re: [mpich-discuss] Possible setup problem
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > Sorry.  Just run:
> > > > > > > > >    mpicc simple_test.c -o simple_test
> > > > > > > > > 
> > > > > > > > > If you needed to specify the full path for mpiexec, use 
the same path for mpicc.  This will generate the executable called 
simple_test.
> > > > > > > > > 
> > > > > > > > > -d
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > On Apr 25, 2011, at 3:26 PM, Andy_Holland at URSCorp.com 
wrote:
> > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > Darius, 
> > > > > > > > > >         Thanks for your help with this.  You'll have 
to forgive me though, I'm a Fortran programmer and I'm not exactly sure 
how to compile the program you sent me.  I have gcc by the way. 
> > > > > > > > > > 
> > > > > > > > > > Thanks, 
> > > > > > > > > > 
> > > > > > > > > > Andy Holland
> > > > > > > > > > Air Quality Modeler
> > > > > > > > > > URS Corporation
> > > > > > > > > > 1600 Perimeter Park Drive
> > > > > > > > > > Suite 400
> > > > > > > > > > Morrisville, NC 27560
> > > > > > > > > > Direct: (303) 796-4694
> > > > > > > > > > Cell: (919) 619-4218
> > > > > > > > > > Fax: (919) 461-1415
> > > > > > > > > > andy_holland at urscorp.com 
> > > > > > > > > > 
> > > > > > > > > > This e-mail and any attachments contain URS 
Corporation confidential information that may be proprietary or 
privileged. If you receive this message in error or are not the intended 
recipient, you should not retain, distribute, disclose or use any of this 
information and you should destroy the e-mail and any attachments or 
copies.
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > Darius Buntinas <buntinas at mcs.anl.gov> 
> > > > > > > > > > Sent by: mpich-discuss-bounces at mcs.anl.gov
> > > > > > > > > > 04/25/2011 03:19 PM
> > > > > > > > > > Please respond to
> > > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > To
> > > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > > cc
> > > > > > > > > > Subject
> > > > > > > > > > Re: [mpich-discuss] Possible setup problem
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > OK, can you try the attached test program with the 
same number of processes and machine file, but also add the -l option to 
mpiexec (to label the lines of output with the rank).
> > > > > > > > > > 
> > > > > > > > > > Thanks,
> > > > > > > > > > -d
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > [attachment "simple_test.c" deleted by Andy 
Holland/Denver/URSCorp] 
> > > > > > > > > > On Apr 25, 2011, at 2:00 PM, Andy_Holland at URSCorp.com 
wrote:
> > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > I've attached the log for running cpi using the same 
machinefile. 
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > Thank you, 
> > > > > > > > > > > 
> > > > > > > > > > > Andy Holland
> > > > > > > > > > > Air Quality Modeler
> > > > > > > > > > > URS Corporation
> > > > > > > > > > > 1600 Perimeter Park Drive
> > > > > > > > > > > Suite 400
> > > > > > > > > > > Morrisville, NC 27560
> > > > > > > > > > > Direct: (303) 796-4694
> > > > > > > > > > > Cell: (919) 619-4218
> > > > > > > > > > > Fax: (919) 461-1415
> > > > > > > > > > > andy_holland at urscorp.com 
> > > > > > > > > > > 
> > > > > > > > > > > This e-mail and any attachments contain URS 
Corporation confidential information that may be proprietary or 
privileged. If you receive this message in error or are not the intended 
recipient, you should not retain, distribute, disclose or use any of this 
information and you should destroy the e-mail and any attachments or 
copies.
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > Darius Buntinas <buntinas at mcs.anl.gov> 
> > > > > > > > > > > Sent by: mpich-discuss-bounces at mcs.anl.gov
> > > > > > > > > > > 04/25/2011 02:51 PM
> > > > > > > > > > > Please respond to
> > > > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > To
> > > > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > > > cc
> > > > > > > > > > > Subject
> > > > > > > > > > > Re: [mpich-discuss] Possible setup problem
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > Hi Andy,
> > > > > > > > > > > 
> > > > > > > > > > > Can you try running cpi from the examples directory 
of the MPICH2 source tree with the same number of processes and the same 
machine file?  Let us know if that works, and, if not, send us the output, 
please.
> > > > > > > > > > > 
> > > > > > > > > > > Thanks,
> > > > > > > > > > > -d
> > > > > > > > > > > 
> > > > > > > > > > > On Apr 25, 2011, at 1:30 PM, 
Andy_Holland at URSCorp.com wrote:
> > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > It was suggested that I send out all the error 
messages.  I've attached a log file from the model. 
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > Thank you, 
> > > > > > > > > > > > 
> > > > > > > > > > > > Andy Holland
> > > > > > > > > > > > Air Quality Modeler
> > > > > > > > > > > > URS Corporation
> > > > > > > > > > > > 1600 Perimeter Park Drive
> > > > > > > > > > > > Suite 400
> > > > > > > > > > > > Morrisville, NC 27560
> > > > > > > > > > > > Direct: (303) 796-4694
> > > > > > > > > > > > Cell: (919) 619-4218
> > > > > > > > > > > > Fax: (919) 461-1415
> > > > > > > > > > > > andy_holland at urscorp.com 
> > > > > > > > > > > > 
> > > > > > > > > > > > This e-mail and any attachments contain URS 
Corporation confidential information that may be proprietary or 
privileged. If you receive this message in error or are not the intended 
recipient, you should not retain, distribute, disclose or use any of this 
information and you should destroy the e-mail and any attachments or 
copies.
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > Dave Goodell <goodell at mcs.anl.gov> 
> > > > > > > > > > > > Sent by: mpich-discuss-bounces at mcs.anl.gov
> > > > > > > > > > > > 04/25/2011 02:22 PM
> > > > > > > > > > > > Please respond to
> > > > > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > To
> > > > > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > > > > cc
> > > > > > > > > > > > Subject
> > > > > > > > > > > > Re: [mpich-discuss] Possible setup problem
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > On Apr 25, 2011, at 12:59 PM CDT, 
Andy_Holland at URSCorp.com wrote:
> > > > > > > > > > > > 
> > > > > > > > > > > > > When I run from either machine using CPUs from 
both machines the run stops with many mpi messages.  Below is the last 
message in the list: 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > main 
(/usr/local/mpich2-1.3.2p1/src/pm/hydra/ui/mpich/mpiexec.c:404): process 
manager error waiting for completion 
> > > > > > > > > > > > 
> > > > > > > > > > > > Can you send us all of the error messages? 
Typically the first error messages are the most useful/relevant; the last 
ones often are just messages announcing some sort of cleanup or secondary 
error caused by the original error.
> > > > > > > > > > > > 
> > > > > > > > > > > > -Dave
> > > > > > > > > > > > 
> > > > > > > > > > > > _______________________________________________
> > > > > > > > > > > > mpich-discuss mailing list
> > > > > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > > > > 
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> > > > > > > > > > > > 
> > > > > > > > > > > > 
<run.cctm.parallel.txt>_______________________________________________
> > > > > > > > > > > > mpich-discuss mailing list
> > > > > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > > > > 
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> > > > > > > > > > > 
> > > > > > > > > > > _______________________________________________
> > > > > > > > > > > mpich-discuss mailing list
> > > > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > > > 
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> > > > > > > > > > > 
> > > > > > > > > > > 
<cpi_log.txt>_______________________________________________
> > > > > > > > > > > mpich-discuss mailing list
> > > > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > > > 
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> > > > > > > > > > 
> > > > > > > > > > _______________________________________________
> > > > > > > > > > mpich-discuss mailing list
> > > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > > 
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> > > > > > > > > > 
> > > > > > > > > > _______________________________________________
> > > > > > > > > > mpich-discuss mailing list
> > > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > > 
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> > > > > > > > > 
> > > > > > > > > _______________________________________________
> > > > > > > > > mpich-discuss mailing list
> > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> > > > > > > > > 
> > > > > > > > > 
<simple_test_log.txt>_______________________________________________
> > > > > > > > > mpich-discuss mailing list
> > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> > > > > > > > 
> > > > > > > > _______________________________________________
> > > > > > > > mpich-discuss mailing list
> > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> > > > > > > > 
> > > > > > > > 
<simple_test_log.txt>_______________________________________________
> > > > > > > > mpich-discuss mailing list
> > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> > > > > > > 
> > > > > > > _______________________________________________
> > > > > > > mpich-discuss mailing list
> > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> > > > > > > 
> > > > > > > <simple_test_log.txt>
> > > > > > 
> > > > > > 
> > > > > > <simple_test_log.txt>
> > > > > 
> > > > > 
> > > > > <simple_test_log.txt>
> > > > 
> > > > 
> > > 
> > > 
> > 
> > 
> > <simple_test_log.txt>
> 
> 

_______________________________________________
mpich-discuss mailing list
mpich-discuss at mcs.anl.gov
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20110429/57296eb7/attachment-0001.htm>


More information about the mpich-discuss mailing list