<br><font size=2 face="sans-serif">Darius,</font>
<br><font size=2 face="sans-serif"> There
is quite a bit of output from the program. When I pipe the standard
output the actual program never starts, MPICH messages just fill the screen
and keep going and going. It does work just fine if I redirect the
standard output to a file.</font>
<br><font size=2 face="sans-serif"><br>
Andy Holland<br>
Air Quality Modeler<br>
URS Corporation<br>
1600 Perimeter Park Drive<br>
Suite 400<br>
Morrisville, NC 27560<br>
Direct: (303) 796-4694<br>
Cell: (919) 619-4218<br>
Fax: (919) 461-1415<br>
andy_holland@urscorp.com</font>
<br><font size=2 face="sans-serif"><br>
</font>
<table>
<tr>
<td><font size=1 color=#4f4f4f face="sans-serif">This e-mail and any attachments
contain URS Corporation confidential information that may be proprietary
or privileged. If you receive this message in error or are not the intended
recipient, you should not retain, distribute, disclose or use any of this
information and you should destroy the e-mail and any attachments or copies.</font></table>
<br><font size=2 face="sans-serif"><br>
<br>
</font>
<br>
<br>
<br>
<table width=100%>
<tr valign=top>
<td width=48%><font size=1 face="sans-serif"><b>Darius Buntinas <buntinas@mcs.anl.gov></b>
</font>
<br><font size=1 face="sans-serif">Sent by: mpich-discuss-bounces@mcs.anl.gov</font>
<p><font size=1 face="sans-serif">04/29/2011 11:04 AM</font>
<table border>
<tr valign=top>
<td bgcolor=white>
<div align=center><font size=1 face="sans-serif">Please respond to<br>
mpich-discuss@mcs.anl.gov</font></div></table>
<br>
<br>
<td width=51%>
<table width=100%>
<tr valign=top>
<td>
<div align=right><font size=1 face="sans-serif">To</font></div>
<td><font size=1 face="sans-serif">Andy_Holland@URSCorp.com, mpich-discuss@mcs.anl.gov</font>
<tr valign=top>
<td>
<div align=right><font size=1 face="sans-serif">cc</font></div>
<td>
<tr valign=top>
<td>
<div align=right><font size=1 face="sans-serif">Subject</font></div>
<td><font size=1 face="sans-serif">Re: [mpich-discuss] Possible setup problem</font></table>
<br>
<table>
<tr valign=top>
<td>
<td></table>
<br></table>
<br>
<br>
<br><tt><font size=2>[Re-adding mpich-discuss]<br>
<br>
Is there a lot of output (e.g., a few pages, or a few MBs)? The process
manager is not designed to handle a lot of stdin/out traffic. If
you have a lot of data it's better to write it directly to a file.<br>
<br>
I think you said this was a fortran program. I know there is some
trickiness with buffering I/O in fortran. How do you know the program
is hanging? Does the program not finish in the expected time, or
do you just not see any output in the redirected file when you expect it.
If it's the latter, it could be that the output is being buffered
in which case you might have to wait until the program terminates before
you see the output.<br>
<br>
-d<br>
<br>
On Apr 29, 2011, at 9:52 AM, Andy_Holland@URSCorp.com wrote:<br>
<br>
> <br>
> The problem only occurs when I pipe the screen output to a log file.
If I don't do that, it runs fine. <br>
> <br>
> Andy Holland<br>
> Air Quality Modeler<br>
> URS Corporation<br>
> 1600 Perimeter Park Drive<br>
> Suite 400<br>
> Morrisville, NC 27560<br>
> Direct: (303) 796-4694<br>
> Cell: (919) 619-4218<br>
> Fax: (919) 461-1415<br>
> andy_holland@urscorp.com <br>
> <br>
> This e-mail and any attachments contain URS Corporation confidential
information that may be proprietary or privileged. If you receive this
message in error or are not the intended recipient, you should not retain,
distribute, disclose or use any of this information and you should destroy
the e-mail and any attachments or copies.<br>
> <br>
> <br>
> <br>
> <br>
> <br>
> <br>
> Darius Buntinas <buntinas@mcs.anl.gov><br>
> 04/28/2011 05:15 PM <br>
> <br>
> To<br>
> Andy_Holland@URSCorp.com<br>
> cc<br>
> Subject<br>
> Re: [mpich-discuss] Possible setup problem<br>
> <br>
> <br>
> <br>
> <br>
> <br>
> <br>
> It looks like the test program worked. <br>
> <br>
> Check whether your app works on one node. Also try other applications
to see if they work over two nodes.<br>
> <br>
> -d<br>
> <br>
> On Apr 28, 2011, at 3:38 PM, Andy_Holland@URSCorp.com wrote:<br>
> <br>
> > <br>
> > We have modified some files on the machines and now when I do
'host s051rhlapp01' it gives me the actual IP address of the machine. I've
attached the log file for your simple test after this correction. I
think it completed successfully, but wanted to check with you. <br>
> > <br>
> > The model I'm trying to run using MPICH starts off fine now,
but then hangs at a certain point, not sure if this there is still a problem
or not. <br>
> > <br>
> > <br>
> > <br>
> > Thanks, <br>
> > <br>
> > Andy Holland<br>
> > Air Quality Modeler<br>
> > URS Corporation<br>
> > 1600 Perimeter Park Drive<br>
> > Suite 400<br>
> > Morrisville, NC 27560<br>
> > Direct: (303) 796-4694<br>
> > Cell: (919) 619-4218<br>
> > Fax: (919) 461-1415<br>
> > andy_holland@urscorp.com <br>
> > <br>
> > This e-mail and any attachments contain URS Corporation confidential
information that may be proprietary or privileged. If you receive this
message in error or are not the intended recipient, you should not retain,
distribute, disclose or use any of this information and you should destroy
the e-mail and any attachments or copies.<br>
> > <br>
> > <br>
> > <br>
> > <br>
> > <br>
> > <br>
> > Darius Buntinas <buntinas@mcs.anl.gov><br>
> > 04/27/2011 05:13 PM <br>
> > <br>
> > To<br>
> > Andy_Holland@URSCorp.com<br>
> > cc<br>
> > Subject<br>
> > Re: [mpich-discuss] Possible setup problem<br>
> > <br>
> > <br>
> > <br>
> > <br>
> > <br>
> > <br>
> > The problem is that machine A is unable to determine what it's
IP address is from it's hostname. So if you do a<br>
> > hostname<br>
> > from machine A, it should return A (or A.foo.com). Then
you should be able to do<br>
> > host A <br>
> > (or "host A.foo.com") and get the IP address of the
machine. It looks like your machines are returning the loopback address.
It's possible that you just need to make sure that the /etc/hosts
file on each machine has _its_own_ name in there (the one returned by hostname)
and that its set to the machine's actual IP address (and not 127.0.0.1).<br>
> > <br>
> > I'm not an expert in configuring networks, so I can't really
be more specific. Sorry.<br>
> > <br>
> > -d <br>
> > <br>
> > On Apr 27, 2011, at 4:06 PM, Andy_Holland@URSCorp.com wrote:<br>
> > <br>
> > > <br>
> > > The /etc/hosts file only has the short names in it. I'm
not exactly sure what the networking issue is that I need to let the sysadmin
know about. Can you please explain it to me? <br>
> > > <br>
> > > Thanks, <br>
> > > <br>
> > > Andy Holland<br>
> > > Air Quality Modeler<br>
> > > URS Corporation<br>
> > > 1600 Perimeter Park Drive<br>
> > > Suite 400<br>
> > > Morrisville, NC 27560<br>
> > > Direct: (303) 796-4694<br>
> > > Cell: (919) 619-4218<br>
> > > Fax: (919) 461-1415<br>
> > > andy_holland@urscorp.com <br>
> > > <br>
> > > This e-mail and any attachments contain URS Corporation
confidential information that may be proprietary or privileged. If you
receive this message in error or are not the intended recipient, you should
not retain, distribute, disclose or use any of this information and you
should destroy the e-mail and any attachments or copies.<br>
> > > <br>
> > > <br>
> > > <br>
> > > <br>
> > > <br>
> > > <br>
> > > Darius Buntinas <buntinas@mcs.anl.gov><br>
> > > 04/27/2011 04:53 PM <br>
> > > <br>
> > > To<br>
> > > Andy_Holland@URSCorp.com<br>
> > > cc<br>
> > > Subject<br>
> > > Re: [mpich-discuss] Possible setup problem<br>
> > > <br>
> > > <br>
> > > <br>
> > > <br>
> > > <br>
> > > <br>
> > > How are the machines getting the IP address when using the
fill name? If they're in /etc/hosts, then I would go ahead and add
the short names there. Otherwise, while adding the short names there
will work, there's another network configuration problem that's causing
this and may give you trouble in the future, so it might be worth it to
find a sysadmin to help you (I'm lucky enough to have great sysadmins here,
so I don't (have to) know too much about configuring networking.).<br>
> > > <br>
> > > -d<br>
> > > <br>
> > > On Apr 27, 2011, at 3:46 PM, Andy_Holland@URSCorp.com wrote:<br>
> > > <br>
> > > > <br>
> > > > I just tried doing the host command with the full name
of the machine including the domain and it is returning the correct IP
address for each machine. The /etc/hosts files on the machines do
not include the domain in the machine name. Maybe they should? <br>
> > > > <br>
> > > > Andy Holland<br>
> > > > Air Quality Modeler<br>
> > > > URS Corporation<br>
> > > > 1600 Perimeter Park Drive<br>
> > > > Suite 400<br>
> > > > Morrisville, NC 27560<br>
> > > > Direct: (303) 796-4694<br>
> > > > Cell: (919) 619-4218<br>
> > > > Fax: (919) 461-1415<br>
> > > > andy_holland@urscorp.com <br>
> > > > <br>
> > > > This e-mail and any attachments contain URS Corporation
confidential information that may be proprietary or privileged. If you
receive this message in error or are not the intended recipient, you should
not retain, distribute, disclose or use any of this information and you
should destroy the e-mail and any attachments or copies.<br>
> > > > <br>
> > > > <br>
> > > > <br>
> > > > <br>
> > > > <br>
> > > > <br>
> > > > Darius Buntinas <buntinas@mcs.anl.gov><br>
> > > > 04/27/2011 02:58 PM <br>
> > > > <br>
> > > > To<br>
> > > > Andy_Holland@URSCorp.com<br>
> > > > cc<br>
> > > > Subject<br>
> > > > Re: [mpich-discuss] Possible setup problem<br>
> > > > <br>
> > > > <br>
> > > > <br>
> > > > <br>
> > > > <br>
> > > > <br>
> > > > I think I found the problem. I should have checked
this earlier. It looks like your machines are set up to return 127.0.0.1
(the loopback address) when resolving their own hostname, rather than their
actual IP address.<br>
> > > > <br>
> > > > Try this on s051rhlapp01:<br>
> > > > hostname<br>
> > > > It should return s051rhlapp01. Then try:<br>
> > > > host s051rhlapp01<br>
> > > > It should NOT return 127.0.0.1. Then try the
same thing on s051rhlapp01 (using it's own name).<br>
> > > > <br>
> > > > If you don't get what you should, it indicates a problem
with your network configuration.<br>
> > > > <br>
> > > > -d<br>
> > > > <br>
> > > > On Apr 26, 2011, at 5:04 PM, Andy_Holland@URSCorp.com
wrote:<br>
> > > > <br>
> > > > > <br>
> > > > > Here ya go. <br>
> > > > > <br>
> > > > > <br>
> > > > > <br>
> > > > > Andy Holland<br>
> > > > > Air Quality Modeler<br>
> > > > > URS Corporation<br>
> > > > > 1600 Perimeter Park Drive<br>
> > > > > Suite 400<br>
> > > > > Morrisville, NC 27560<br>
> > > > > Direct: (303) 796-4694<br>
> > > > > Cell: (919) 619-4218<br>
> > > > > Fax: (919) 461-1415<br>
> > > > > andy_holland@urscorp.com <br>
> > > > > <br>
> > > > > This e-mail and any attachments contain URS Corporation
confidential information that may be proprietary or privileged. If you
receive this message in error or are not the intended recipient, you should
not retain, distribute, disclose or use any of this information and you
should destroy the e-mail and any attachments or copies.<br>
> > > > > <br>
> > > > > <br>
> > > > > <br>
> > > > > <br>
> > > > > <br>
> > > > > <br>
> > > > > Darius Buntinas <buntinas@mcs.anl.gov><br>
> > > > > 04/26/2011 05:56 PM <br>
> > > > > <br>
> > > > > To<br>
> > > > > Andy_Holland@URSCorp.com<br>
> > > > > cc<br>
> > > > > Subject<br>
> > > > > Re: [mpich-discuss] Possible setup problem<br>
> > > > > <br>
> > > > > <br>
> > > > > <br>
> > > > > <br>
> > > > > <br>
> > > > > Oops I forgot to mention that you need to recompile
the simple_test file:<br>
> > > > > <br>
> > > > > mpicc simple_test.c -o simple_test<br>
> > > > > <br>
> > > > > Can you try it again?<br>
> > > > > <br>
> > > > > Thanks,<br>
> > > > > -d<br>
> > > > > <br>
> > > > > On Apr 26, 2011, at 3:45 PM, Andy_Holland@URSCorp.com
wrote:<br>
> > > > > <br>
> > > > > > <br>
> > > > > > Ok, I applied the patch and rebuilt both
installations and reran your test program. Attached is the log file.
<br>
> > > > > > <br>
> > > > > > <br>
> > > > > > <br>
> > > > > > Thank you, <br>
> > > > > > <br>
> > > > > > Andy Holland<br>
> > > > > > Air Quality Modeler<br>
> > > > > > URS Corporation<br>
> > > > > > 1600 Perimeter Park Drive<br>
> > > > > > Suite 400<br>
> > > > > > Morrisville, NC 27560<br>
> > > > > > Direct: (303) 796-4694<br>
> > > > > > Cell: (919) 619-4218<br>
> > > > > > Fax: (919) 461-1415<br>
> > > > > > andy_holland@urscorp.com <br>
> > > > > > <br>
> > > > > > This e-mail and any attachments contain URS
Corporation confidential information that may be proprietary or privileged.
If you receive this message in error or are not the intended recipient,
you should not retain, distribute, disclose or use any of this information
and you should destroy the e-mail and any attachments or copies.<br>
> > > > > > <br>
> > > > > > <br>
> > > > > > <br>
> > > > > > <br>
> > > > > > <br>
> > > > > > <br>
> > > > > > Darius Buntinas <buntinas@mcs.anl.gov><br>
> > > > > > 04/26/2011 02:20 PM <br>
> > > > > > <br>
> > > > > > To<br>
> > > > > > Andy_Holland@URSCorp.com<br>
> > > > > > cc<br>
> > > > > > Subject<br>
> > > > > > Re: [mpich-discuss] Possible setup problem<br>
> > > > > > <br>
> > > > > > <br>
> > > > > > <br>
> > > > > > <br>
> > > > > > <br>
> > > > > > Hmm. I found a bug with error reporting.
While this won't directly fix your problem, it may help with identifying
it.<br>
> > > > > > <br>
> > > > > > Can you apply this patch, then rebuild and
re-install mpich2 on both machines?<br>
> > > > > > <br>
> > > > > > (from the mpich2 source directory)<br>
> > > > > > patch -p0 < errno.patch<br>
> > > > > > make clean<br>
> > > > > > make<br>
> > > > > > make install<br>
> > > > > > <br>
> > > > > > Then try the simple_test.c again and send
us the log.<br>
> > > > > > <br>
> > > > > > Thanks,<br>
> > > > > > -d<br>
> > > > > > <br>
> > > > > > [attachment "errno.patch" deleted
by Andy Holland/Denver/URSCorp] <br>
> > > > > > <br>
> > > > > > On Apr 26, 2011, at 11:28 AM, Andy_Holland@URSCorp.com
wrote:<br>
> > > > > > <br>
> > > > > > > <br>
> > > > > > > Ok, I turned iptables off on both machines
and reran it. Attached is the log file. <br>
> > > > > > > <br>
> > > > > > > <br>
> > > > > > > <br>
> > > > > > > Andy Holland<br>
> > > > > > > Air Quality Modeler<br>
> > > > > > > URS Corporation<br>
> > > > > > > 1600 Perimeter Park Drive<br>
> > > > > > > Suite 400<br>
> > > > > > > Morrisville, NC 27560<br>
> > > > > > > Direct: (303) 796-4694<br>
> > > > > > > Cell: (919) 619-4218<br>
> > > > > > > Fax: (919) 461-1415<br>
> > > > > > > andy_holland@urscorp.com <br>
> > > > > > > <br>
> > > > > > > This e-mail and any attachments contain
URS Corporation confidential information that may be proprietary or privileged.
If you receive this message in error or are not the intended recipient,
you should not retain, distribute, disclose or use any of this information
and you should destroy the e-mail and any attachments or copies.<br>
> > > > > > > <br>
> > > > > > > <br>
> > > > > > > <br>
> > > > > > > <br>
> > > > > > > <br>
> > > > > > > <br>
> > > > > > > Darius Buntinas <buntinas@mcs.anl.gov>
<br>
> > > > > > > Sent by: mpich-discuss-bounces@mcs.anl.gov<br>
> > > > > > > 04/26/2011 11:13 AM<br>
> > > > > > > Please respond to<br>
> > > > > > > mpich-discuss@mcs.anl.gov<br>
> > > > > > > <br>
> > > > > > > <br>
> > > > > > > To<br>
> > > > > > > mpich-discuss@mcs.anl.gov<br>
> > > > > > > cc<br>
> > > > > > > Subject<br>
> > > > > > > Re: [mpich-discuss] Possible setup problem<br>
> > > > > > > <br>
> > > > > > > <br>
> > > > > > > <br>
> > > > > > > <br>
> > > > > > > <br>
> > > > > > > <br>
> > > > > > > For some reason, it's not showing the
specific socket error, but it's happening when a process on s051rhlapp02
tries to send a message to a process on s051rhlapp01. Can you try
disabling the firewalls on the machines and try it again?<br>
> > > > > > > <br>
> > > > > > > Thanks,<br>
> > > > > > > -d<br>
> > > > > > > <br>
> > > > > > > On Apr 25, 2011, at 5:39 PM, Andy_Holland@URSCorp.com
wrote:<br>
> > > > > > > <br>
> > > > > > > > <br>
> > > > > > > > Yeah, I put it in the wrong directory.
Ok, I reran in a shared area and I've attached the log file. <br>
> > > > > > > > <br>
> > > > > > > > <br>
> > > > > > > > <br>
> > > > > > > > Thanks, <br>
> > > > > > > > <br>
> > > > > > > > Andy Holland<br>
> > > > > > > > Air Quality Modeler<br>
> > > > > > > > URS Corporation<br>
> > > > > > > > 1600 Perimeter Park Drive<br>
> > > > > > > > Suite 400<br>
> > > > > > > > Morrisville, NC 27560<br>
> > > > > > > > Direct: (303) 796-4694<br>
> > > > > > > > Cell: (919) 619-4218<br>
> > > > > > > > Fax: (919) 461-1415<br>
> > > > > > > > andy_holland@urscorp.com <br>
> > > > > > > > <br>
> > > > > > > > This e-mail and any attachments
contain URS Corporation confidential information that may be proprietary
or privileged. If you receive this message in error or are not the intended
recipient, you should not retain, distribute, disclose or use any of this
information and you should destroy the e-mail and any attachments or copies.<br>
> > > > > > > > <br>
> > > > > > > > <br>
> > > > > > > > <br>
> > > > > > > > <br>
> > > > > > > > <br>
> > > > > > > > <br>
> > > > > > > > Darius Buntinas <buntinas@mcs.anl.gov>
<br>
> > > > > > > > Sent by: mpich-discuss-bounces@mcs.anl.gov<br>
> > > > > > > > 04/25/2011 05:45 PM<br>
> > > > > > > > Please respond to<br>
> > > > > > > > mpich-discuss@mcs.anl.gov<br>
> > > > > > > > <br>
> > > > > > > > <br>
> > > > > > > > To<br>
> > > > > > > > mpich-discuss@mcs.anl.gov<br>
> > > > > > > > cc<br>
> > > > > > > > Subject<br>
> > > > > > > > Re: [mpich-discuss] Possible setup
problem<br>
> > > > > > > > <br>
> > > > > > > > <br>
> > > > > > > > <br>
> > > > > > > > <br>
> > > > > > > > <br>
> > > > > > > > <br>
> > > > > > > > Andy,<br>
> > > > > > > > <br>
> > > > > > > > Looking through the log file, I
see a line that says:<br>
> > > > > > > > <br>
> > > > > > > > [proxy:0:1@s051rhlapp02] launch_procs
(/usr/local/mpich2-1.3.2p1/src/pm/hydra/pm/pmiserv/pmip_cb.c:639): unable
to change wdir to /home/andy_holland/mpich_test (No such file or directory)<br>
> > > > > > > > <br>
> > > > > > > > Can you check that you can access
/home/andy_holland/mpich_test from s051rhlapp02 ?<br>
> > > > > > > > <br>
> > > > > > > > If not, put simple_test into a
directory that's accessible from both machines, and try it again.<br>
> > > > > > > > <br>
> > > > > > > > Thanks,<br>
> > > > > > > > -d<br>
> > > > > > > > <br>
> > > > > > > > On Apr 25, 2011, at 3:55 PM, Andy_Holland@URSCorp.com
wrote:<br>
> > > > > > > > <br>
> > > > > > > > > <br>
> > > > > > > > > Daruis, <br>
> > > > > > > > >
Thanks. If I had just thought for a second longer I would have had
it. Attached is the log file for your test program. <br>
> > > > > > > > > <br>
> > > > > > > > > <br>
> > > > > > > > > <br>
> > > > > > > > > Andy Holland<br>
> > > > > > > > > Air Quality Modeler<br>
> > > > > > > > > URS Corporation<br>
> > > > > > > > > 1600 Perimeter Park Drive<br>
> > > > > > > > > Suite 400<br>
> > > > > > > > > Morrisville, NC 27560<br>
> > > > > > > > > Direct: (303) 796-4694<br>
> > > > > > > > > Cell: (919) 619-4218<br>
> > > > > > > > > Fax: (919) 461-1415<br>
> > > > > > > > > andy_holland@urscorp.com <br>
> > > > > > > > > <br>
> > > > > > > > > This e-mail and any attachments
contain URS Corporation confidential information that may be proprietary
or privileged. If you receive this message in error or are not the intended
recipient, you should not retain, distribute, disclose or use any of this
information and you should destroy the e-mail and any attachments or copies.<br>
> > > > > > > > > <br>
> > > > > > > > > <br>
> > > > > > > > > <br>
> > > > > > > > > <br>
> > > > > > > > > <br>
> > > > > > > > > <br>
> > > > > > > > > Darius Buntinas <buntinas@mcs.anl.gov>
<br>
> > > > > > > > > Sent by: mpich-discuss-bounces@mcs.anl.gov<br>
> > > > > > > > > 04/25/2011 04:32 PM<br>
> > > > > > > > > Please respond to<br>
> > > > > > > > > mpich-discuss@mcs.anl.gov<br>
> > > > > > > > > <br>
> > > > > > > > > <br>
> > > > > > > > > To<br>
> > > > > > > > > mpich-discuss@mcs.anl.gov<br>
> > > > > > > > > cc<br>
> > > > > > > > > Subject<br>
> > > > > > > > > Re: [mpich-discuss] Possible
setup problem<br>
> > > > > > > > > <br>
> > > > > > > > > <br>
> > > > > > > > > <br>
> > > > > > > > > <br>
> > > > > > > > > <br>
> > > > > > > > > <br>
> > > > > > > > > Sorry. Just run:<br>
> > > > > > > > > mpicc simple_test.c
-o simple_test<br>
> > > > > > > > > <br>
> > > > > > > > > If you needed to specify the
full path for mpiexec, use the same path for mpicc. This will generate
the executable called simple_test.<br>
> > > > > > > > > <br>
> > > > > > > > > -d<br>
> > > > > > > > > <br>
> > > > > > > > > <br>
> > > > > > > > > On Apr 25, 2011, at 3:26 PM,
Andy_Holland@URSCorp.com wrote:<br>
> > > > > > > > > <br>
> > > > > > > > > > <br>
> > > > > > > > > > Darius, <br>
> > > > > > > > > >
Thanks for your help with this. You'll have to forgive me
though, I'm a Fortran programmer and I'm not exactly sure how to compile
the program you sent me. I have gcc by the way. <br>
> > > > > > > > > > <br>
> > > > > > > > > > Thanks, <br>
> > > > > > > > > > <br>
> > > > > > > > > > Andy Holland<br>
> > > > > > > > > > Air Quality Modeler<br>
> > > > > > > > > > URS Corporation<br>
> > > > > > > > > > 1600 Perimeter Park Drive<br>
> > > > > > > > > > Suite 400<br>
> > > > > > > > > > Morrisville, NC 27560<br>
> > > > > > > > > > Direct: (303) 796-4694<br>
> > > > > > > > > > Cell: (919) 619-4218<br>
> > > > > > > > > > Fax: (919) 461-1415<br>
> > > > > > > > > > andy_holland@urscorp.com
<br>
> > > > > > > > > > <br>
> > > > > > > > > > This e-mail and any attachments
contain URS Corporation confidential information that may be proprietary
or privileged. If you receive this message in error or are not the intended
recipient, you should not retain, distribute, disclose or use any of this
information and you should destroy the e-mail and any attachments or copies.<br>
> > > > > > > > > > <br>
> > > > > > > > > > <br>
> > > > > > > > > > <br>
> > > > > > > > > > <br>
> > > > > > > > > > <br>
> > > > > > > > > > <br>
> > > > > > > > > > Darius Buntinas <buntinas@mcs.anl.gov>
<br>
> > > > > > > > > > Sent by: mpich-discuss-bounces@mcs.anl.gov<br>
> > > > > > > > > > 04/25/2011 03:19 PM<br>
> > > > > > > > > > Please respond to<br>
> > > > > > > > > > mpich-discuss@mcs.anl.gov<br>
> > > > > > > > > > <br>
> > > > > > > > > > <br>
> > > > > > > > > > To<br>
> > > > > > > > > > mpich-discuss@mcs.anl.gov<br>
> > > > > > > > > > cc<br>
> > > > > > > > > > Subject<br>
> > > > > > > > > > Re: [mpich-discuss] Possible
setup problem<br>
> > > > > > > > > > <br>
> > > > > > > > > > <br>
> > > > > > > > > > <br>
> > > > > > > > > > <br>
> > > > > > > > > > <br>
> > > > > > > > > > <br>
> > > > > > > > > > OK, can you try the attached
test program with the same number of processes and machine file, but also
add the -l option to mpiexec (to label the lines of output with the rank).<br>
> > > > > > > > > > <br>
> > > > > > > > > > Thanks,<br>
> > > > > > > > > > -d<br>
> > > > > > > > > > <br>
> > > > > > > > > > <br>
> > > > > > > > > > [attachment "simple_test.c"
deleted by Andy Holland/Denver/URSCorp] <br>
> > > > > > > > > > On Apr 25, 2011, at 2:00
PM, Andy_Holland@URSCorp.com wrote:<br>
> > > > > > > > > > <br>
> > > > > > > > > > > <br>
> > > > > > > > > > > I've attached the
log for running cpi using the same machinefile. <br>
> > > > > > > > > > > <br>
> > > > > > > > > > > <br>
> > > > > > > > > > > <br>
> > > > > > > > > > > Thank you, <br>
> > > > > > > > > > > <br>
> > > > > > > > > > > Andy Holland<br>
> > > > > > > > > > > Air Quality Modeler<br>
> > > > > > > > > > > URS Corporation<br>
> > > > > > > > > > > 1600 Perimeter Park
Drive<br>
> > > > > > > > > > > Suite 400<br>
> > > > > > > > > > > Morrisville, NC
27560<br>
> > > > > > > > > > > Direct: (303) 796-4694<br>
> > > > > > > > > > > Cell: (919) 619-4218<br>
> > > > > > > > > > > Fax: (919) 461-1415<br>
> > > > > > > > > > > andy_holland@urscorp.com
<br>
> > > > > > > > > > > <br>
> > > > > > > > > > > This e-mail and
any attachments contain URS Corporation confidential information that may
be proprietary or privileged. If you receive this message in error or are
not the intended recipient, you should not retain, distribute, disclose
or use any of this information and you should destroy the e-mail and any
attachments or copies.<br>
> > > > > > > > > > > <br>
> > > > > > > > > > > <br>
> > > > > > > > > > > <br>
> > > > > > > > > > > <br>
> > > > > > > > > > > <br>
> > > > > > > > > > > <br>
> > > > > > > > > > > Darius Buntinas
<buntinas@mcs.anl.gov> <br>
> > > > > > > > > > > Sent by: mpich-discuss-bounces@mcs.anl.gov<br>
> > > > > > > > > > > 04/25/2011 02:51
PM<br>
> > > > > > > > > > > Please respond to<br>
> > > > > > > > > > > mpich-discuss@mcs.anl.gov<br>
> > > > > > > > > > > <br>
> > > > > > > > > > > <br>
> > > > > > > > > > > To<br>
> > > > > > > > > > > mpich-discuss@mcs.anl.gov<br>
> > > > > > > > > > > cc<br>
> > > > > > > > > > > Subject<br>
> > > > > > > > > > > Re: [mpich-discuss]
Possible setup problem<br>
> > > > > > > > > > > <br>
> > > > > > > > > > > <br>
> > > > > > > > > > > <br>
> > > > > > > > > > > <br>
> > > > > > > > > > > <br>
> > > > > > > > > > > Hi Andy,<br>
> > > > > > > > > > > <br>
> > > > > > > > > > > Can you try running
cpi from the examples directory of the MPICH2 source tree with the same
number of processes and the same machine file? Let us know if that
works, and, if not, send us the output, please.<br>
> > > > > > > > > > > <br>
> > > > > > > > > > > Thanks,<br>
> > > > > > > > > > > -d<br>
> > > > > > > > > > > <br>
> > > > > > > > > > > On Apr 25, 2011,
at 1:30 PM, Andy_Holland@URSCorp.com wrote:<br>
> > > > > > > > > > > <br>
> > > > > > > > > > > > <br>
> > > > > > > > > > > > It was suggested
that I send out all the error messages. I've attached a log file
from the model. <br>
> > > > > > > > > > > > <br>
> > > > > > > > > > > > <br>
> > > > > > > > > > > > <br>
> > > > > > > > > > > > Thank you,
<br>
> > > > > > > > > > > > <br>
> > > > > > > > > > > > Andy Holland<br>
> > > > > > > > > > > > Air Quality
Modeler<br>
> > > > > > > > > > > > URS Corporation<br>
> > > > > > > > > > > > 1600 Perimeter
Park Drive<br>
> > > > > > > > > > > > Suite 400<br>
> > > > > > > > > > > > Morrisville,
NC 27560<br>
> > > > > > > > > > > > Direct: (303)
796-4694<br>
> > > > > > > > > > > > Cell: (919)
619-4218<br>
> > > > > > > > > > > > Fax: (919)
461-1415<br>
> > > > > > > > > > > > andy_holland@urscorp.com
<br>
> > > > > > > > > > > > <br>
> > > > > > > > > > > > This e-mail
and any attachments contain URS Corporation confidential information that
may be proprietary or privileged. If you receive this message in error
or are not the intended recipient, you should not retain, distribute, disclose
or use any of this information and you should destroy the e-mail and any
attachments or copies.<br>
> > > > > > > > > > > > <br>
> > > > > > > > > > > > <br>
> > > > > > > > > > > > <br>
> > > > > > > > > > > > <br>
> > > > > > > > > > > > <br>
> > > > > > > > > > > > <br>
> > > > > > > > > > > > Dave Goodell
<goodell@mcs.anl.gov> <br>
> > > > > > > > > > > > Sent by: mpich-discuss-bounces@mcs.anl.gov<br>
> > > > > > > > > > > > 04/25/2011
02:22 PM<br>
> > > > > > > > > > > > Please respond
to<br>
> > > > > > > > > > > > mpich-discuss@mcs.anl.gov<br>
> > > > > > > > > > > > <br>
> > > > > > > > > > > > <br>
> > > > > > > > > > > > To<br>
> > > > > > > > > > > > mpich-discuss@mcs.anl.gov<br>
> > > > > > > > > > > > cc<br>
> > > > > > > > > > > > Subject<br>
> > > > > > > > > > > > Re: [mpich-discuss]
Possible setup problem<br>
> > > > > > > > > > > > </font></tt>
<br><tt><font size=2>> > > > > > > > > >
> > <br>
> > > > > > > > > > > > <br>
> > > > > > > > > > > > <br>
> > > > > > > > > > > > <br>
> > > > > > > > > > > > On Apr 25,
2011, at 12:59 PM CDT, Andy_Holland@URSCorp.com wrote:<br>
> > > > > > > > > > > > <br>
> > > > > > > > > > > > > When I
run from either machine using CPUs from both machines the run stops with
many mpi messages. Below is the last message in the list: <br>
> > > > > > > > > > > > > <br>
> > > > > > > > > > > > > main (/usr/local/mpich2-1.3.2p1/src/pm/hydra/ui/mpich/mpiexec.c:404):
process manager error waiting for completion <br>
> > > > > > > > > > > > <br>
> > > > > > > > > > > > Can you send
us all of the error messages? Typically the first error messages
are the most useful/relevant; the last ones often are just messages announcing
some sort of cleanup or secondary error caused by the original error.<br>
> > > > > > > > > > > > <br>
> > > > > > > > > > > > -Dave<br>
> > > > > > > > > > > > <br>
> > > > > > > > > > > > _______________________________________________<br>
> > > > > > > > > > > > mpich-discuss
mailing list<br>
> > > > > > > > > > > > mpich-discuss@mcs.anl.gov<br>
> > > > > > > > > > > > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss<br>
> > > > > > > > > > > > <br>
> > > > > > > > > > > > <run.cctm.parallel.txt>_______________________________________________<br>
> > > > > > > > > > > > mpich-discuss
mailing list<br>
> > > > > > > > > > > > mpich-discuss@mcs.anl.gov<br>
> > > > > > > > > > > > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss<br>
> > > > > > > > > > > <br>
> > > > > > > > > > > _______________________________________________<br>
> > > > > > > > > > > mpich-discuss mailing
list<br>
> > > > > > > > > > > mpich-discuss@mcs.anl.gov<br>
> > > > > > > > > > > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss<br>
> > > > > > > > > > > <br>
> > > > > > > > > > > <cpi_log.txt>_______________________________________________<br>
> > > > > > > > > > > mpich-discuss mailing
list<br>
> > > > > > > > > > > mpich-discuss@mcs.anl.gov<br>
> > > > > > > > > > > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss<br>
> > > > > > > > > > <br>
> > > > > > > > > > _______________________________________________<br>
> > > > > > > > > > mpich-discuss mailing
list<br>
> > > > > > > > > > mpich-discuss@mcs.anl.gov<br>
> > > > > > > > > > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss<br>
> > > > > > > > > > <br>
> > > > > > > > > > _______________________________________________<br>
> > > > > > > > > > mpich-discuss mailing
list<br>
> > > > > > > > > > mpich-discuss@mcs.anl.gov<br>
> > > > > > > > > > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss<br>
> > > > > > > > > <br>
> > > > > > > > > _______________________________________________<br>
> > > > > > > > > mpich-discuss mailing list<br>
> > > > > > > > > mpich-discuss@mcs.anl.gov<br>
> > > > > > > > > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss<br>
> > > > > > > > > <br>
> > > > > > > > > <simple_test_log.txt>_______________________________________________<br>
> > > > > > > > > mpich-discuss mailing list<br>
> > > > > > > > > mpich-discuss@mcs.anl.gov<br>
> > > > > > > > > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss<br>
> > > > > > > > <br>
> > > > > > > > _______________________________________________<br>
> > > > > > > > mpich-discuss mailing list<br>
> > > > > > > > mpich-discuss@mcs.anl.gov<br>
> > > > > > > > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss<br>
> > > > > > > > <br>
> > > > > > > > <simple_test_log.txt>_______________________________________________<br>
> > > > > > > > mpich-discuss mailing list<br>
> > > > > > > > mpich-discuss@mcs.anl.gov<br>
> > > > > > > > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss<br>
> > > > > > > <br>
> > > > > > > _______________________________________________<br>
> > > > > > > mpich-discuss mailing list<br>
> > > > > > > mpich-discuss@mcs.anl.gov<br>
> > > > > > > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss<br>
> > > > > > > <br>
> > > > > > > <simple_test_log.txt><br>
> > > > > > <br>
> > > > > > <br>
> > > > > > <simple_test_log.txt><br>
> > > > > <br>
> > > > > <br>
> > > > > <simple_test_log.txt><br>
> > > > <br>
> > > > <br>
> > > <br>
> > > <br>
> > <br>
> > <br>
> > <simple_test_log.txt><br>
> <br>
> <br>
<br>
_______________________________________________<br>
mpich-discuss mailing list<br>
mpich-discuss@mcs.anl.gov<br>
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss<br>
</font></tt>
<br>