<br><font size=2 face="sans-serif">I thought of that, but the output files
never got any bigger and I waited until it should have been done for a
long time to make sure.</font>
<br><font size=2 face="sans-serif"><br>
Andy Holland<br>
Air Quality Modeler<br>
URS Corporation<br>
1600 Perimeter Park Drive<br>
Suite 400<br>
Morrisville, NC 27560<br>
Direct: (303) 796-4694<br>
Cell: (919) 619-4218<br>
Fax: (919) 461-1415<br>
andy_holland@urscorp.com</font>
<br><font size=2 face="sans-serif"><br>
</font>
<table>
<tr>
<td><font size=1 color=#4f4f4f face="sans-serif">This e-mail and any attachments
contain URS Corporation confidential information that may be proprietary
or privileged. If you receive this message in error or are not the intended
recipient, you should not retain, distribute, disclose or use any of this
information and you should destroy the e-mail and any attachments or copies.</font></table>
<br><font size=2 face="sans-serif"><br>
<br>
</font>
<br>
<br>
<br>
<table width=100%>
<tr valign=top>
<td width=48%><font size=1 face="sans-serif"><b>Darius Buntinas <buntinas@mcs.anl.gov></b>
</font>
<br><font size=1 face="sans-serif">Sent by: mpich-discuss-bounces@mcs.anl.gov</font>
<p><font size=1 face="sans-serif">04/29/2011 11:27 AM</font>
<table border>
<tr valign=top>
<td bgcolor=white>
<div align=center><font size=1 face="sans-serif">Please respond to<br>
mpich-discuss@mcs.anl.gov</font></div></table>
<br>
<br>
<td width=51%>
<table width=100%>
<tr valign=top>
<td>
<div align=right><font size=1 face="sans-serif">To</font></div>
<td><font size=1 face="sans-serif">mpich-discuss@mcs.anl.gov</font>
<tr valign=top>
<td>
<div align=right><font size=1 face="sans-serif">cc</font></div>
<td>
<tr valign=top>
<td>
<div align=right><font size=1 face="sans-serif">Subject</font></div>
<td><font size=1 face="sans-serif">Re: [mpich-discuss] Possible setup problem</font></table>
<br>
<table>
<tr valign=top>
<td>
<td></table>
<br></table>
<br>
<br>
<br><tt><font size=2><br>
tee does it's own buffering as well. Are you sure the program hangs?
Did you wait long enough for it to finish running?<br>
<br>
I recently had a similar issue with a long running script. I thought
it hung, but tee just buffered the output till the program terminated,
so I didn't see any output on the screen or in the file till the end.<br>
<br>
-d<br>
<br>
On Apr 29, 2011, at 10:18 AM, Andy_Holland@URSCorp.com wrote:<br>
<br>
> <br>
> This command doesn't work: <br>
> <br>
> run.cctm |& tee run.cctm.log <br>
> <br>
> This command does work: <br>
> <br>
> run.cctm > run.cct.log <br>
> <br>
> The run.cctm file is the run script. This is the mpich command
in that script: <br>
> <br>
> time /usr/local/mpich2/bin/mpirun -v -machinefile machine8 -np 16
$BASE/$EXEC <br>
> <br>
> Andy Holland<br>
> Air Quality Modeler<br>
> URS Corporation<br>
> 1600 Perimeter Park Drive<br>
> Suite 400<br>
> Morrisville, NC 27560<br>
> Direct: (303) 796-4694<br>
> Cell: (919) 619-4218<br>
> Fax: (919) 461-1415<br>
> andy_holland@urscorp.com <br>
> <br>
> This e-mail and any attachments contain URS Corporation confidential
information that may be proprietary or privileged. If you receive this
message in error or are not the intended recipient, you should not retain,
distribute, disclose or use any of this information and you should destroy
the e-mail and any attachments or copies.<br>
> <br>
> <br>
> <br>
> <br>
> <br>
> <br>
> Darius Buntinas <buntinas@mcs.anl.gov> <br>
> Sent by: mpich-discuss-bounces@mcs.anl.gov<br>
> 04/29/2011 11:14 AM<br>
> Please respond to<br>
> mpich-discuss@mcs.anl.gov<br>
> <br>
> <br>
> To<br>
> mpich-discuss@mcs.anl.gov<br>
> cc<br>
> Subject<br>
> Re: [mpich-discuss] Possible setup problem<br>
> <br>
> <br>
> <br>
> <br>
> <br>
> <br>
> Can you send us the command line you're using in both cases (where
it works and where it doesn't)?<br>
> <br>
> Thanks,<br>
> -d<br>
> <br>
> On Apr 29, 2011, at 10:08 AM, Andy_Holland@URSCorp.com wrote:<br>
> <br>
> > <br>
> > Darius, <br>
> > There is quite a bit of output from
the program. When I pipe the standard output the actual program never
starts, MPICH messages just fill the screen and keep going and going. It
does work just fine if I redirect the standard output to a file. <br>
> > <br>
> > Andy Holland<br>
> > Air Quality Modeler<br>
> > URS Corporation<br>
> > 1600 Perimeter Park Drive<br>
> > Suite 400<br>
> > Morrisville, NC 27560<br>
> > Direct: (303) 796-4694<br>
> > Cell: (919) 619-4218<br>
> > Fax: (919) 461-1415<br>
> > andy_holland@urscorp.com <br>
> > <br>
> > This e-mail and any attachments contain URS Corporation confidential
information that may be proprietary or privileged. If you receive this
message in error or are not the intended recipient, you should not retain,
distribute, disclose or use any of this information and you should destroy
the e-mail and any attachments or copies.<br>
> > <br>
> > <br>
> > <br>
> > <br>
> > <br>
> > <br>
> > Darius Buntinas <buntinas@mcs.anl.gov> <br>
> > Sent by: mpich-discuss-bounces@mcs.anl.gov<br>
> > 04/29/2011 11:04 AM<br>
> > Please respond to<br>
> > mpich-discuss@mcs.anl.gov<br>
> > <br>
> > <br>
> > To<br>
> > Andy_Holland@URSCorp.com, mpich-discuss@mcs.anl.gov<br>
> > cc<br>
> > Subject<br>
> > Re: [mpich-discuss] Possible setup problem<br>
> > <br>
> > <br>
> > <br>
> > <br>
> > <br>
> > [Re-adding mpich-discuss]<br>
> > <br>
> > Is there a lot of output (e.g., a few pages, or a few MBs)? The
process manager is not designed to handle a lot of stdin/out traffic. If
you have a lot of data it's better to write it directly to a file.<br>
> > <br>
> > I think you said this was a fortran program. I know there
is some trickiness with buffering I/O in fortran. How do you know
the program is hanging? Does the program not finish in the expected
time, or do you just not see any output in the redirected file when you
expect it. If it's the latter, it could be that the output is being
buffered in which case you might have to wait until the program terminates
before you see the output.<br>
> > <br>
> > -d<br>
> > <br>
> > On Apr 29, 2011, at 9:52 AM, Andy_Holland@URSCorp.com wrote:<br>
> > <br>
> > > <br>
> > > The problem only occurs when I pipe the screen output to
a log file. If I don't do that, it runs fine. <br>
> > > <br>
> > > Andy Holland<br>
> > > Air Quality Modeler<br>
> > > URS Corporation<br>
> > > 1600 Perimeter Park Drive<br>
> > > Suite 400<br>
> > > Morrisville, NC 27560<br>
> > > Direct: (303) 796-4694<br>
> > > Cell: (919) 619-4218<br>
> > > Fax: (919) 461-1415<br>
> > > andy_holland@urscorp.com <br>
> > > <br>
> > > This e-mail and any attachments contain URS Corporation
confidential information that may be proprietary or privileged. If you
receive this message in error or are not the intended recipient, you should
not retain, distribute, disclose or use any of this information and you
should destroy the e-mail and any attachments or copies.<br>
> > > <br>
> > > <br>
> > > <br>
> > > <br>
> > > <br>
> > > <br>
> > > Darius Buntinas <buntinas@mcs.anl.gov><br>
> > > 04/28/2011 05:15 PM <br>
> > > <br>
> > > To<br>
> > > Andy_Holland@URSCorp.com<br>
> > > cc<br>
> > > Subject<br>
> > > Re: [mpich-discuss] Possible setup problem<br>
> > > <br>
> > > <br>
> > > <br>
> > > <br>
> > > <br>
> > > <br>
> > > It looks like the test program worked. <br>
> > > <br>
> > > Check whether your app works on one node. Also try
other applications to see if they work over two nodes.<br>
> > > <br>
> > > -d<br>
> > > <br>
> > > On Apr 28, 2011, at 3:38 PM, Andy_Holland@URSCorp.com wrote:<br>
> > > <br>
> > > > <br>
> > > > We have modified some files on the machines and now
when I do 'host s051rhlapp01' it gives me the actual IP address of the
machine. I've attached the log file for your simple test after this
correction. I think it completed successfully, but wanted to check
with you. <br>
> > > > <br>
> > > > The model I'm trying to run using MPICH starts off
fine now, but then hangs at a certain point, not sure if this there is
still a problem or not. <br>
> > > > <br>
> > > > <br>
> > > > <br>
> > > > Thanks, <br>
> > > > <br>
> > > > Andy Holland<br>
> > > > Air Quality Modeler<br>
> > > > URS Corporation<br>
> > > > 1600 Perimeter Park Drive<br>
> > > > Suite 400<br>
> > > > Morrisville, NC 27560<br>
> > > > Direct: (303) 796-4694<br>
> > > > Cell: (919) 619-4218<br>
> > > > Fax: (919) 461-1415<br>
> > > > andy_holland@urscorp.com <br>
> > > > <br>
> > > > This e-mail and any attachments contain URS Corporation
confidential information that may be proprietary or privileged. If you
receive this message in error or are not the intended recipient, you should
not retain, distribute, disclose or use any of this information and you
should destroy the e-mail and any attachments or copies.<br>
> > > > <br>
> > > > <br>
> > > > <br>
> > > > <br>
> > > > <br>
> > > > <br>
> > > > Darius Buntinas <buntinas@mcs.anl.gov><br>
> > > > 04/27/2011 05:13 PM <br>
> > > > <br>
> > > > To<br>
> > > > Andy_Holland@URSCorp.com<br>
> > > > cc<br>
> > > > Subject<br>
> > > > Re: [mpich-discuss] Possible setup problem<br>
> > > > <br>
> > > > <br>
> > > > <br>
> > > > <br>
> > > > <br>
> > > > <br>
> > > > The problem is that machine A is unable to determine
what it's IP address is from it's hostname. So if you do a<br>
> > > > hostname<br>
> > > > from machine A, it should return A (or A.foo.com).
Then you should be able to do<br>
> > > > host A <br>
> > > > (or "host A.foo.com") and get the IP address
of the machine. It looks like your machines are returning the loopback
address. It's possible that you just need to make sure that the /etc/hosts
file on each machine has _its_own_ name in there (the one returned by hostname)
and that its set to the machine's actual IP address (and not 127.0.0.1).<br>
> > > > <br>
> > > > I'm not an expert in configuring networks, so I can't
really be more specific. Sorry.<br>
> > > > <br>
> > > > -d <br>
> > > > <br>
> > > > On Apr 27, 2011, at 4:06 PM, Andy_Holland@URSCorp.com
wrote:<br>
> > > > <br>
> > > > > <br>
> > > > > The /etc/hosts file only has the short names in
it. I'm not exactly sure what the networking issue is that I need
to let the sysadmin know about. Can you please explain it to me?
<br>
> > > > > <br>
> > > > > Thanks, <br>
> > > > > <br>
> > > > > Andy Holland<br>
> > > > > Air Quality Modeler<br>
> > > > > URS Corporation<br>
> > > > > 1600 Perimeter Park Drive<br>
> > > > > Suite 400<br>
> > > > > Morrisville, NC 27560<br>
> > > > > Direct: (303) 796-4694<br>
> > > > > Cell: (919) 619-4218<br>
> > > > > Fax: (919) 461-1415<br>
> > > > > andy_holland@urscorp.com <br>
> > > > > <br>
> > > > > This e-mail and any attachments contain URS Corporation
confidential information that may be proprietary or privileged. If you
receive this message in error or are not the intended recipient, you should
not retain, distribute, disclose or use any of this information and you
should destroy the e-mail and any attachments or copies.<br>
> > > > > <br>
> > > > > <br>
> > > > > <br>
> > > > > <br>
> > > > > <br>
> > > > > <br>
> > > > > Darius Buntinas <buntinas@mcs.anl.gov><br>
> > > > > 04/27/2011 04:53 PM <br>
> > > > > <br>
> > > > > To<br>
> > > > > Andy_Holland@URSCorp.com<br>
> > > > > cc<br>
> > > > > Subject<br>
> > > > > Re: [mpich-discuss] Possible setup problem<br>
> > > > > <br>
> > > > > <br>
> > > > > <br>
> > > > > <br>
> > > > > <br>
> > > > > <br>
> > > > > How are the machines getting the IP address when
using the fill name? If they're in /etc/hosts, then I would go ahead
and add the short names there. Otherwise, while adding the short
names there will work, there's another network configuration problem that's
causing this and may give you trouble in the future, so it might be worth
it to find a sysadmin to help you (I'm lucky enough to have great sysadmins
here, so I don't (have to) know too much about configuring networking.).<br>
> > > > > <br>
> > > > > -d<br>
> > > > > <br>
> > > > > On Apr 27, 2011, at 3:46 PM, Andy_Holland@URSCorp.com
wrote:<br>
> > > > > <br>
> > > > > > <br>
> > > > > > I just tried doing the host command with
the full name of the machine including the domain and it is returning the
correct IP address for each machine. The /etc/hosts files on the
machines do not include the domain in the machine name. Maybe they
should? <br>
> > > > > > <br>
> > > > > > Andy Holland<br>
> > > > > > Air Quality Modeler<br>
> > > > > > URS Corporation<br>
> > > > > > 1600 Perimeter Park Drive<br>
> > > > > > Suite 400<br>
> > > > > > Morrisville, NC 27560<br>
> > > > > > Direct: (303) 796-4694<br>
> > > > > > Cell: (919) 619-4218<br>
> > > > > > Fax: (919) 461-1415<br>
> > > > > > andy_holland@urscorp.com <br>
> > > > > > <br>
> > > > > > This e-mail and any attachments contain URS
Corporation confidential information that may be proprietary or privileged.
If you receive this message in error or are not the intended recipient,
you should not retain, distribute, disclose or use any of this information
and you should destroy the e-mail and any attachments or copies.<br>
> > > > > > <br>
> > > > > > <br>
> > > > > > <br>
> > > > > > <br>
> > > > > > <br>
> > > > > > <br>
> > > > > > Darius Buntinas <buntinas@mcs.anl.gov><br>
> > > > > > 04/27/2011 02:58 PM <br>
> > > > > > <br>
> > > > > > To<br>
> > > > > > Andy_Holland@URSCorp.com<br>
> > > > > > cc<br>
> > > > > > Subject<br>
> > > > > > Re: [mpich-discuss] Possible setup problem<br>
> > > > > > <br>
> > > > > > <br>
> > > > > > <br>
> > > > > > <br>
> > > > > > <br>
> > > > > > <br>
> > > > > > I think I found the problem. I should
have checked this earlier. It looks like your machines are set up
to return 127.0.0.1 (the loopback address) when resolving their own hostname,
rather than their actual IP address.<br>
> > > > > > <br>
> > > > > > Try this on s051rhlapp01:<br>
> > > > > > hostname<br>
> > > > > > It should return s051rhlapp01. Then
try:<br>
> > > > > > host s051rhlapp01<br>
> > > > > > It should NOT return 127.0.0.1. Then
try the same thing on s051rhlapp01 (using it's own name).<br>
> > > > > > <br>
> > > > > > If you don't get what you should, it indicates
a problem with your network configuration.<br>
> > > > > > <br>
> > > > > > -d<br>
> > > > > > <br>
> > > > > > On Apr 26, 2011, at 5:04 PM, Andy_Holland@URSCorp.com
wrote:<br>
> > > > > > <br>
> > > > > > > <br>
> > > > > > > Here ya go. <br>
> > > > > > > <br>
> > > > > > > <br>
> > > > > > > <br>
> > > > > > > Andy Holland<br>
> > > > > > > Air Quality Modeler<br>
> > > > > > > URS Corporation<br>
> > > > > > > 1600 Perimeter Park Drive<br>
> > > > > > > Suite 400<br>
> > > > > > > Morrisville, NC 27560<br>
> > > > > > > Direct: (303) 796-4694<br>
> > > > > > > Cell: (919) 619-4218<br>
> > > > > > > Fax: (919) 461-1415<br>
> > > > > > > andy_holland@urscorp.com <br>
> > > > > > > <br>
> > > > > > > This e-mail and any attachments contain
URS Corporation confidential information that may be proprietary or privileged.
If you receive this message in error or are not the intended recipient,
you should not retain, distribute, disclose or use any of this information
and you should destroy the e-mail and any attachments or copies.<br>
> > > > > > > <br>
> > > > > > > <br>
> > > > > > > <br>
> > > > > > > <br>
> > > > > > > <br>
> > > > > > > <br>
> > > > > > > Darius Buntinas <buntinas@mcs.anl.gov><br>
> > > > > > > 04/26/2011 05:56 PM <br>
> > > > > > > <br>
> > > > > > > To<br>
> > > > > > > Andy_Holland@URSCorp.com<br>
> > > > > > > cc<br>
> > > > > > > Subject<br>
> > > > > > > Re: [mpich-discuss] Possible setup problem<br>
> > > > > > > <br>
> > > > > > > <br>
> > > > > > > <br>
> > > > > > > <br>
> > > > > > > <br>
> > > > > > > Oops I forgot to mention that you need
to recompile the simple_test file:<br>
> > > > > > > <br>
> > > > > > > mpicc simple_test.c -o simple_test<br>
> > > > > > > <br>
> > > > > > > Can you try it again?<br>
> > > > > > > <br>
> > > > > > > Thanks,<br>
> > > > > > > -d<br>
> > > > > > > <br>
> > > > > > > On Apr 26, 2011, at 3:45 PM, Andy_Holland@URSCorp.com
wrote:<br>
> > > > > > > <br>
> > > > > > > > <br>
> > > > > > > > Ok, I applied the patch and rebuilt
both installations and reran your test program. Attached is the log
file. <br>
> > > > > > > > <br>
> > > > > > > > <br>
> > > > > > > > <br>
> > > > > > > > Thank you, <br>
> > > > > > > > <br>
> > > > > > > > Andy Holland<br>
> > > > > > > > Air Quality Modeler<br>
> > > > > > > > URS Corporation<br>
> > > > > > > > 1600 Perimeter Park Drive<br>
> > > > > > > > Suite 400<br>
> > > > > > > > Morrisville, NC 27560<br>
> > > > > > > > Direct: (303) 796-4694<br>
> > > > > > > > Cell: (919) 619-4218<br>
> > > > > > > > Fax: (919) 461-1415<br>
> > > > > > > > andy_holland@urscorp.com <br>
> > > > > > > > <br>
> > > > > > > > This e-mail and any attachments
contain URS Corporation confidential information that may be proprietary
or privileged. If you receive this message in error or are not the intended
recipient, you should not retain, distribute, disclose or use any of this
information and you should destroy the e-mail and any attachments or copies.<br>
> > > > > > > > <br>
> > > > > > > > <br>
> > > > > > > > <br>
> > > > > > > > <br>
> > > > > > > > <br>
> > > > > > > > <br>
> > > > > > > > Darius Buntinas <buntinas@mcs.anl.gov><br>
> > > > > > > > 04/26/2011 02:20 PM <br>
> > > > > > > > <br>
> > > > > > > > To<br>
> > > > > > > > Andy_Holland@URSCorp.com<br>
> > > > > > > > cc<br>
> > > > > > > > Subject<br>
> > > > > > > > Re: [mpich-discuss] Possible setup
problem<br>
> > > > > > > > <br>
> > > > > > > > <br>
> > > > > > > > <br>
> > > > > > > > <br>
> > > > > > > > <br>
> > > > > > > > Hmm. I found a bug with error
reporting. While this won't directly fix your problem, it may help
with identifying it.<br>
> > > > > > > > <br>
> > > > > > > > Can you apply this patch, then
rebuild and re-install mpich2 on both machines?<br>
> > > > > > > > <br>
> > > > > > > > (from the mpich2 source
directory)<br>
> > > > > > > > patch -p0 < errno.patch<br>
> > > > > > > > make clean<br>
> > > > > > > > make<br>
> > > > > > > > make install<br>
> > > > > > > > <br>
> > > > > > > > Then try the simple_test.c again
and send us the log.<br>
> > > > > > > > <br>
> > > > > > > > Thanks,<br>
> > > > > > > > -d<br>
> > > > > > > > <br>
> > > > > > > > [attachment "errno.patch"
deleted by Andy Holland/Denver/URSCorp] <br>
> > > > > > > > <br>
> > > > > > > > On Apr 26, 2011, at 11:28 AM, Andy_Holland@URSCorp.com
wrote:<br>
> > > > > > > > <br>
> > > > > > > > > <br>
> > > > > > > > > Ok, I turned iptables off
on both machines and reran it. Attached is the log file. <br>
> > > > > > > > > <br>
> > > > > > > > > <br>
> > > > > > > > > <br>
> > > > > > > > > Andy Holland<br>
> > > > > > > > > Air Quality Modeler<br>
> > > > > > > > > URS Corporation<br>
> > > > > > > > > 1600 Perimeter Park Drive<br>
> > > > > > > > > Suite 400<br>
> > > > > > > > > Morrisville, NC 27560<br>
> > > > > > > > > Direct: (303) 796-4694<br>
> > > > > > > > > Cell: (919) 619-4218<br>
> > > > > > > > > Fax: (919) 461-1415<br>
> > > > > > > > > andy_holland@urscorp.com <br>
> > > > > > > > > <br>
> > > > > > > > > This e-mail and any attachments
contain URS Corporation confidential information that may be proprietary
or privileged. If you receive this message in error or are not the intended
recipient, you should not retain, distribute, disclose or use any of this
information and you should destroy the e-mail and any attachments or copies.<br>
> > > > > > > > > <br>
> > > > > > > > > <br>
> > > > > > > > > <br>
> > > > > > > > > <br>
> > > > > > > > > <br>
> > > > > > > > > <br>
> > > > > > > > > Darius Buntinas <buntinas@mcs.anl.gov>
<br>
> > > > > > > > > Sent by: mpich-discuss-bounces@mcs.anl.gov<br>
> > > > > > > > > 04/26/2011 11:13 AM<br>
> > > > > > > > > Please respond to<br>
> > > > > > > > > mpich-discuss@mcs.anl.gov<br>
> > > > > > > > > <br>
> > > > > > > > > <br>
> > > > > > > > > To<br>
> > > > > > > > > mpich-discuss@mcs.anl.gov<br>
> > > > > > > > > cc<br>
> > > > > > > > > Subject<br>
> > > > > > > > > Re: [mpich-discuss] Possible
setup problem<br>
> > > > > > > > > <br>
> > > > > > > > > <br>
> > > > > > > > > <br>
> > > > > > > > > <br>
> > > > > > > > > <br>
> > > > > > > > > <br>
> > > > > > > > > For some reason, it's not
showing the specific socket error, but it's happening when a process on
s051rhlapp02 tries to send a message to a process on s051rhlapp01. Can
you try disabling the firewalls on the machines and try it again?<br>
> > > > > > > > > <br>
> > > > > > > > > Thanks,<br>
> > > > > > > > > -d<br>
> > > > > > > > > <br>
> > > > > > > > > On Apr 25, 2011, at 5:39 PM,
Andy_Holland@URSCorp.com wrote:<br>
> > > > > > > > > <br>
> > > > > > > > > > <br>
> > > > > > > > > > Yeah, I put it in the
wrong directory. Ok, I reran in a shared area and I've attached the
log file. <br>
> > > > > > > > > > <br>
> > > > > > > > > > <br>
> > > > > > > > > > <br>
> > > > > > > > > > Thanks, <br>
> > > > > > > > > > <br>
> > > > > > > > > > Andy Holland<br>
> > > > > > > > > > Air Quality Modeler<br>
> > > > > > > > > > URS Corporation<br>
> > > > > > > > > > 1600 Perimeter Park Drive<br>
> > > > > > > > > > Suite 400<br>
> > > > > > > > > > Morrisville, NC 27560<br>
> > > > > > > > > > Direct: (303) 796-4694<br>
> > > > > > > > > > Cell: (919) 619-4218<br>
> > > > > > > > > > Fax: (919) 461-1415<br>
> > > > > > > > > > andy_holland@urscorp.com
<br>
> > > > > > > > > > <br>
> > > > > > > > > > This e-mail and any attachments
contain URS Corporation confidential information that may be proprietary
or privileged. If you receive this message in error or are not the intended
recipient, you should not retain, distribute, disclose or use any of this
information and you should destroy the e-mail and any attachments or copies.<br>
> > > > > > > > > > <br>
> > > > > > > > > > <br>
> > > > > > > > > > <br>
> > > > > > > > > > <br>
> > > > > > > > > > <br>
> > > > > > > > > > <br>
> > > > > > > > > > Darius Buntinas <buntinas@mcs.anl.gov>
<br>
> > > > > > > > > > Sent by: mpich-discuss-bounces@mcs.anl.gov<br>
> > > > > > > > > > 04/25/2011 05:45 PM<br>
> > > > > > > > > > Please respond to<br>
> > > > > > > > > > mpich-discuss@mcs.anl.gov<br>
> > > > > > > > > > <br>
> > > > > > > > > > <br>
> > > > > > > > > > To<br>
> > > > > > > > > > mpich-discuss@mcs.anl.gov<br>
> > > > > > > > > > cc<br>
> > > > > > > > > > Subject<br>
> > > > > > > > > > Re: [mpich-discuss] Possible
setup problem<br>
> > > > > > > > > > <br>
> > > > > > > > > > <br>
> > > > > > > > > > <br>
> > > > > > > > > > <br>
> > > > > > > > > > <br>
> > > > > > > > > > <br>
> > > > > > > > > > Andy,<br>
> > > > > > > > > > <br>
> > > > > > > > > > Looking through the log
file, I see a line that says:<br>
> > > > > > > > > > <br>
> > > > > > > > > > [proxy:0:1@s051rhlapp02]
launch_procs (/usr/local/mpich2-1.3.2p1/src/pm/hydra/pm/pmiserv/pmip_cb.c:639):
unable to change wdir to /home/andy_holland/mpich_test (No such file or
directory)<br>
> > > > > > > > > > <br>
> > > > > > > > > > Can you check that you
can access /home/andy_holland/mpich_test from s051rhlapp02 ?<br>
> > > > > > > > > > <br>
> > > > > > > > > > If not, put simple_test
into a directory that's accessible from both machines, and try it again.<br>
> > > > > > > > > > <br>
> > > > > > > > > > Thanks,<br>
> > > > > > > > > > -d<br>
> > > > > > > > > > <br>
> > > > > > > > > > On Apr 25, 2011, at 3:55
PM, Andy_Holland@URSCorp.com wrote:<br>
> > > > > > > > > > <br>
> > > > > > > > > > > <br>
> > > > > > > > > > > Daruis, <br>
> > > > > > > > > > >
Thanks. If I had just thought for a second longer I would
have had it. Attached is the log file for your test program. <br>
> > > > > > > > > > > <br>
> > > > > > > > > > > <br>
> > > > > > > > > > > <br>
> > > > > > > > > > > Andy Holland<br>
> > > > > > > > > > > Air Quality Modeler<br>
> > > > > > > > > > > URS Corporation<br>
> > > > > > > > > > > 1600 Perimeter Park
Drive<br>
> > > > > > > > > > > Suite 400<br>
> > > > > > > > > > > Morrisville, NC
27560<br>
> > > > > > > > > > > Direct: (303) 796-4694<br>
> > > > > > > > > > > Cell: (919) 619-4218<br>
> > > > > > > > > > > Fax: (919) 461-1415<br>
> > > > > > > > > > > andy_holland@urscorp.com
<br>
> > > > > > > > > > > <br>
> > > > > > > > > > > This e-mail and
any attachments contain URS Corporation confidential information that may
be proprietary or privileged. If you receive this message in error or are
not the intended recipient, you should not retain, distribute, disclose
or use any of this information and you should destroy the e-mail and any
attachments or copies.<br>
> > > > > > > > > > > <br>
> > > > > > > > > > > <br>
> > > > > > > > > > > <br>
> > > > > > > > > > > <br>
> > > > > > > > > > > <br>
> > > > > > > > > > > <br>
> > > > > > > > > > > Darius Buntinas
<buntinas@mcs.anl.gov> <br>
> > > > > > > > > > > Sent by: mpich-discuss-bounces@mcs.anl.gov<br>
> > > > > > > > > > > 04/25/2011 04:32
PM<br>
> > > > > > > > > > > Please respond to<br>
> > > > > > > > > > > mpich-discuss@mcs.anl.gov<br>
> > > > > > > > > > > <br>
> > > > > > > > > > > <br>
> > > > > > > > > > > To<br>
> > > > > > > > > > > mpich-discuss@mcs.anl.gov<br>
> > > > > > > > > > > cc<br>
> > > > > > > > > > > Subject<br>
> > > > > > > > > > > Re: [mpich-discuss]
Possible setup problem<br>
> > > > > > > > > > > <br>
> > > > > > > > > > > <br>
> > > > > > > > > > > <br>
> > > > > > > > > > > <br>
> > > > > > > > > > > <br>
> > > > > > > > > > > <br>
> > > > > > > > > > > Sorry. Just
run:<br>
> > > > > > > > > > > mpicc
simple_test.c -o simple_test<br>
> > > > > > > > > > > <br>
> > > > > > > > > > > If you needed to
specify the full path for mpiexec, use the same path for mpicc. This
will generate the executable called simple_test.<br>
> > > > > > > > > > > <br>
> > > > > > > > > > > -d<br>
> > > > > > > > > > > <br>
> > > > > > > > > > > <br>
> > > > > > > > > > > On Apr 25, 2011,
at 3:26 PM, Andy_Holland@URSCorp.com wrote:<br>
> > > > > > > > > > > <br>
> > > > > > > > > > > > <br>
> > > > > > > > > > > > Darius, <br>
> > > > > > > > > > > >
Thanks for your help with this. You'll have to forgive
me though, I'm a Fortran programmer and I'm not exactly sure how to compile
the program you sent me. I have gcc by the way. <br>
> > > > > > > > > > > > <br>
> > > > > > > > > > > > Thanks, <br>
> > > > > > > > > > > > <br>
> > > > > > > > > > > > Andy Holland<br>
> > > > > > > > > > > > Air Quality
Modeler<br>
> > > > > > > > > > > > URS Corporation<br>
> > > > > > > > > > > > 1600 Perimeter
Park Drive<br>
> > > > > > > > > > > > Suite 400<br>
> > > > > > > > > > > > Morrisville,
NC 27560<br>
> > > > > > > > > > > > Direct: (303)
796-4694<br>
> > > > > > > > > > > > Cell: (919)
619-4218<br>
> > > > > > > > > > > > Fax: (919)
461-1415<br>
> > > > > > > > > > > > andy_holland@urscorp.com
<br>
> > > > > > > > > > > > <br>
> > > > > > > > > > > > This e-mail
and any attachments contain URS Corporation confidential information that
may be proprietary or privileged. If you receive this message in error
or are not the intended recipient, you should not retain, distribute, disclose
or use any of this information and you should destroy the e-mail and any
attachments or copies.<br>
> > > > > > > > > > > > <br>
> > > > > > > > > > > > <br>
> > > > > > > > > > > > <br>
> > > > > > > > > > > > <br>
> > > > > > > > > > > > <br>
> > > > > > > > > > > > <br>
> > > > > > > > > > > > Darius Buntinas
<buntinas@mcs.anl.gov> <br>
> > > > > > > > > > > > Sent by: mpich-discuss-bounces@mcs.anl.gov<br>
> > > > > > > > > > > > 04/25/2011
03:19 PM<br>
> > > > > > > > > > > > Please respond
to<br>
> > > > > > > > > > > > mpich-discuss@mcs.anl.gov<br>
> > > > > > > > > > > > <br>
> > > > > > > > > > > > <br>
> > > > > > > > > > > > To<br>
> > > > > > > > > > > > mpich-discuss@mcs.anl.gov<br>
> > > > > > > > > > > > cc<br>
> > > > > > > > > > > > Subject<br>
> > > > > > > > > > > > Re: [mpich-discuss]
Possible setup problem</font></tt>
<br><tt><font size=2>> > > > > > > > > >
> > <br>
> > > > > > > > > > > > <br>
> > > > > > > > > > > > <br>
> > > > > > > > > > > > <br>
> > > > > > > > > > > > <br>
> > > > > > > > > > > > <br>
> > > > > > > > > > > > OK, can you
try the attached test program with the same number of processes and machine
file, but also add the -l option to mpiexec (to label the lines of output
with the rank).<br>
> > > > > > > > > > > > <br>
> > > > > > > > > > > > Thanks,<br>
> > > > > > > > > > > > -d<br>
> > > > > > > > > > > > <br>
> > > > > > > > > > > > <br>
> > > > > > > > > > > > [attachment
"simple_test.c" deleted by Andy Holland/Denver/URSCorp] <br>
> > > > > > > > > > > > On Apr 25,
2011, at 2:00 PM, Andy_Holland@URSCorp.com wrote:<br>
> > > > > > > > > > > > <br>
> > > > > > > > > > > > > <br>
> > > > > > > > > > > > > I've attached
the log for running cpi using the same machinefile. <br>
> > > > > > > > > > > > > <br>
> > > > > > > > > > > > > <br>
> > > > > > > > > > > > > <br>
> > > > > > > > > > > > > Thank
you, <br>
> > > > > > > > > > > > > <br>
> > > > > > > > > > > > > Andy Holland<br>
> > > > > > > > > > > > > Air Quality
Modeler<br>
> > > > > > > > > > > > > URS Corporation<br>
> > > > > > > > > > > > > 1600 Perimeter
Park Drive<br>
> > > > > > > > > > > > > Suite
400<br>
> > > > > > > > > > > > > Morrisville,
NC 27560<br>
> > > > > > > > > > > > > Direct:
(303) 796-4694<br>
> > > > > > > > > > > > > Cell:
(919) 619-4218<br>
> > > > > > > > > > > > > Fax: (919)
461-1415<br>
> > > > > > > > > > > > > andy_holland@urscorp.com
<br>
> > > > > > > > > > > > > <br>
> > > > > > > > > > > > > This e-mail
and any attachments contain URS Corporation confidential information that
may be proprietary or privileged. If you receive this message in error
or are not the intended recipient, you should not retain, distribute, disclose
or use any of this information and you should destroy the e-mail and any
attachments or copies.<br>
> > > > > > > > > > > > > <br>
> > > > > > > > > > > > > <br>
> > > > > > > > > > > > > <br>
> > > > > > > > > > > > > <br>
> > > > > > > > > > > > > <br>
> > > > > > > > > > > > > <br>
> > > > > > > > > > > > > Darius
Buntinas <buntinas@mcs.anl.gov> <br>
> > > > > > > > > > > > > Sent by:
mpich-discuss-bounces@mcs.anl.gov<br>
> > > > > > > > > > > > > 04/25/2011
02:51 PM<br>
> > > > > > > > > > > > > Please
respond to<br>
> > > > > > > > > > > > > mpich-discuss@mcs.anl.gov<br>
> > > > > > > > > > > > > <br>
> > > > > > > > > > > > > <br>
> > > > > > > > > > > > > To<br>
> > > > > > > > > > > > > mpich-discuss@mcs.anl.gov<br>
> > > > > > > > > > > > > cc<br>
> > > > > > > > > > > > > Subject<br>
> > > > > > > > > > > > > Re: [mpich-discuss]
Possible setup problem<br>
> > > > > > > > > > > > > <br>
> > > > > > > > > > > > > <br>
> > > > > > > > > > > > > <br>
> > > > > > > > > > > > > <br>
> > > > > > > > > > > > > <br>
> > > > > > > > > > > > > Hi Andy,<br>
> > > > > > > > > > > > > <br>
> > > > > > > > > > > > > Can you
try running cpi from the examples directory of the MPICH2 source tree with
the same number of processes and the same machine file? Let us know
if that works, and, if not, send us the output, please.<br>
> > > > > > > > > > > > > <br>
> > > > > > > > > > > > > Thanks,<br>
> > > > > > > > > > > > > -d<br>
> > > > > > > > > > > > > <br>
> > > > > > > > > > > > > On Apr
25, 2011, at 1:30 PM, Andy_Holland@URSCorp.com wrote:<br>
> > > > > > > > > > > > > <br>
> > > > > > > > > > > > > > <br>
> > > > > > > > > > > > > > It
was suggested that I send out all the error messages. I've attached
a log file from the model. <br>
> > > > > > > > > > > > > > <br>
> > > > > > > > > > > > > > <br>
> > > > > > > > > > > > > > <br>
> > > > > > > > > > > > > > Thank
you, <br>
> > > > > > > > > > > > > > <br>
> > > > > > > > > > > > > > Andy
Holland<br>
> > > > > > > > > > > > > > Air
Quality Modeler<br>
> > > > > > > > > > > > > > URS
Corporation<br>
> > > > > > > > > > > > > > 1600
Perimeter Park Drive<br>
> > > > > > > > > > > > > > Suite
400<br>
> > > > > > > > > > > > > > Morrisville,
NC 27560<br>
> > > > > > > > > > > > > > Direct:
(303) 796-4694<br>
> > > > > > > > > > > > > > Cell:
(919) 619-4218<br>
> > > > > > > > > > > > > > Fax:
(919) 461-1415<br>
> > > > > > > > > > > > > > andy_holland@urscorp.com
<br>
> > > > > > > > > > > > > > <br>
> > > > > > > > > > > > > > This
e-mail and any attachments contain URS Corporation confidential information
that may be proprietary or privileged. If you receive this message in error
or are not the intended recipient, you should not retain, distribute, disclose
or use any of this information and you should destroy the e-mail and any
attachments or copies.<br>
> > > > > > > > > > > > > > <br>
> > > > > > > > > > > > > > <br>
> > > > > > > > > > > > > > <br>
> > > > > > > > > > > > > > <br>
> > > > > > > > > > > > > > <br>
> > > > > > > > > > > > > > <br>
> > > > > > > > > > > > > > Dave
Goodell <goodell@mcs.anl.gov> <br>
> > > > > > > > > > > > > > Sent
by: mpich-discuss-bounces@mcs.anl.gov<br>
> > > > > > > > > > > > > > 04/25/2011
02:22 PM<br>
> > > > > > > > > > > > > > Please
respond to<br>
> > > > > > > > > > > > > > mpich-discuss@mcs.anl.gov<br>
> > > > > > > > > > > > > > <br>
> > > > > > > > > > > > > > <br>
> > > > > > > > > > > > > > To<br>
> > > > > > > > > > > > > > mpich-discuss@mcs.anl.gov<br>
> > > > > > > > > > > > > > cc<br>
> > > > > > > > > > > > > > Subject<br>
> > > > > > > > > > > > > > Re:
[mpich-discuss] Possible setup problem<br>
> > > > > > > > > > > > > > <br>
> > > > > > > > > > > > > > <br>
> > > > > > > > > > > > > > <br>
> > > > > > > > > > > > > > <br>
> > > > > > > > > > > > > > <br>
> > > > > > > > > > > > > > On
Apr 25, 2011, at 12:59 PM CDT, Andy_Holland@URSCorp.com wrote:<br>
> > > > > > > > > > > > > > <br>
> > > > > > > > > > > > > > >
When I run from either machine using CPUs from both machines the run stops
with many mpi messages. Below is the last message in the list: <br>
> > > > > > > > > > > > > > >
<br>
> > > > > > > > > > > > > > >
main (/usr/local/mpich2-1.3.2p1/src/pm/hydra/ui/mpich/mpiexec.c:404): process
manager error waiting for completion <br>
> > > > > > > > > > > > > > <br>
> > > > > > > > > > > > > > Can
you send us all of the error messages? Typically the first error
messages are the most useful/relevant; the last ones often are just messages
announcing some sort of cleanup or secondary error caused by the original
error.<br>
> > > > > > > > > > > > > > <br>
> > > > > > > > > > > > > > -Dave<br>
> > > > > > > > > > > > > > <br>
> > > > > > > > > > > > > > _______________________________________________<br>
> > > > > > > > > > > > > > mpich-discuss
mailing list<br>
> > > > > > > > > > > > > > mpich-discuss@mcs.anl.gov<br>
> > > > > > > > > > > > > > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss<br>
> > > > > > > > > > > > > > <br>
> > > > > > > > > > > > > > <run.cctm.parallel.txt>_______________________________________________<br>
> > > > > > > > > > > > > > mpich-discuss
mailing list<br>
> > > > > > > > > > > > > > mpich-discuss@mcs.anl.gov<br>
> > > > > > > > > > > > > > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss<br>
> > > > > > > > > > > > > <br>
> > > > > > > > > > > > > _______________________________________________<br>
> > > > > > > > > > > > > mpich-discuss
mailing list<br>
> > > > > > > > > > > > > mpich-discuss@mcs.anl.gov<br>
> > > > > > > > > > > > > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss<br>
> > > > > > > > > > > > > <br>
> > > > > > > > > > > > > <cpi_log.txt>_______________________________________________<br>
> > > > > > > > > > > > > mpich-discuss
mailing list<br>
> > > > > > > > > > > > > mpich-discuss@mcs.anl.gov<br>
> > > > > > > > > > > > > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss<br>
> > > > > > > > > > > > <br>
> > > > > > > > > > > > _______________________________________________<br>
> > > > > > > > > > > > mpich-discuss
mailing list<br>
> > > > > > > > > > > > mpich-discuss@mcs.anl.gov<br>
> > > > > > > > > > > > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss<br>
> > > > > > > > > > > > <br>
> > > > > > > > > > > > _______________________________________________<br>
> > > > > > > > > > > > mpich-discuss
mailing list<br>
> > > > > > > > > > > > mpich-discuss@mcs.anl.gov<br>
> > > > > > > > > > > > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss<br>
> > > > > > > > > > > <br>
> > > > > > > > > > > _______________________________________________<br>
> > > > > > > > > > > mpich-discuss mailing
list<br>
> > > > > > > > > > > mpich-discuss@mcs.anl.gov<br>
> > > > > > > > > > > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss<br>
> > > > > > > > > > > <br>
> > > > > > > > > > > <simple_test_log.txt>_______________________________________________<br>
> > > > > > > > > > > mpich-discuss mailing
list<br>
> > > > > > > > > > > mpich-discuss@mcs.anl.gov<br>
> > > > > > > > > > > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss<br>
> > > > > > > > > > <br>
> > > > > > > > > > _______________________________________________<br>
> > > > > > > > > > mpich-discuss mailing
list<br>
> > > > > > > > > > mpich-discuss@mcs.anl.gov<br>
> > > > > > > > > > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss<br>
> > > > > > > > > > <br>
> > > > > > > > > > <simple_test_log.txt>_______________________________________________<br>
> > > > > > > > > > mpich-discuss mailing
list<br>
> > > > > > > > > > mpich-discuss@mcs.anl.gov<br>
> > > > > > > > > > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss<br>
> > > > > > > > > <br>
> > > > > > > > > _______________________________________________<br>
> > > > > > > > > mpich-discuss mailing list<br>
> > > > > > > > > mpich-discuss@mcs.anl.gov<br>
> > > > > > > > > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss<br>
> > > > > > > > > <br>
> > > > > > > > > <simple_test_log.txt><br>
> > > > > > > > <br>
> > > > > > > > <br>
> > > > > > > > <simple_test_log.txt><br>
> > > > > > > <br>
> > > > > > > <br>
> > > > > > > <simple_test_log.txt><br>
> > > > > > <br>
> > > > > > <br>
> > > > > <br>
> > > > > <br>
> > > > <br>
> > > > <br>
> > > > <simple_test_log.txt><br>
> > > <br>
> > > <br>
> > <br>
> > _______________________________________________<br>
> > mpich-discuss mailing list<br>
> > mpich-discuss@mcs.anl.gov<br>
> > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss<br>
> > <br>
> > _______________________________________________<br>
> > mpich-discuss mailing list<br>
> > mpich-discuss@mcs.anl.gov<br>
> > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss<br>
> <br>
> _______________________________________________<br>
> mpich-discuss mailing list<br>
> mpich-discuss@mcs.anl.gov<br>
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss<br>
> <br>
> _______________________________________________<br>
> mpich-discuss mailing list<br>
> mpich-discuss@mcs.anl.gov<br>
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss<br>
<br>
_______________________________________________<br>
mpich-discuss mailing list<br>
mpich-discuss@mcs.anl.gov<br>
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss<br>
</font></tt>
<br>