<br><font size=2 face="sans-serif">This command doesn't work:</font>
<br>
<br><font size=2 face="sans-serif">run.cctm |& tee run.cctm.log</font>
<br>
<br><font size=2 face="sans-serif">This command does work:</font>
<br>
<br><font size=2 face="sans-serif">run.cctm > run.cct.log</font>
<br>
<br><font size=2 face="sans-serif">The run.cctm file is the run script.
This is the mpich command in that script:</font>
<br>
<br><font size=2 face="sans-serif">time /usr/local/mpich2/bin/mpirun -v
-machinefile machine8 -np 16 $BASE/$EXEC</font>
<br><font size=2 face="sans-serif"><br>
Andy Holland<br>
Air Quality Modeler<br>
URS Corporation<br>
1600 Perimeter Park Drive<br>
Suite 400<br>
Morrisville, NC 27560<br>
Direct: (303) 796-4694<br>
Cell: (919) 619-4218<br>
Fax: (919) 461-1415<br>
andy_holland@urscorp.com</font>
<br><font size=2 face="sans-serif"><br>
</font>
<table>
<tr>
<td><font size=1 color=#4f4f4f face="sans-serif">This e-mail and any attachments
contain URS Corporation confidential information that may be proprietary
or privileged. If you receive this message in error or are not the intended
recipient, you should not retain, distribute, disclose or use any of this
information and you should destroy the e-mail and any attachments or copies.</font></table>
<br><font size=2 face="sans-serif"><br>
<br>
</font>
<br>
<br>
<br>
<table width=100%>
<tr valign=top>
<td width=48%><font size=1 face="sans-serif"><b>Darius Buntinas <buntinas@mcs.anl.gov></b>
</font>
<br><font size=1 face="sans-serif">Sent by: mpich-discuss-bounces@mcs.anl.gov</font>
<p><font size=1 face="sans-serif">04/29/2011 11:14 AM</font>
<table border>
<tr valign=top>
<td bgcolor=white>
<div align=center><font size=1 face="sans-serif">Please respond to<br>
mpich-discuss@mcs.anl.gov</font></div></table>
<br>
<br>
<td width=51%>
<table width=100%>
<tr valign=top>
<td>
<div align=right><font size=1 face="sans-serif">To</font></div>
<td><font size=1 face="sans-serif">mpich-discuss@mcs.anl.gov</font>
<tr valign=top>
<td>
<div align=right><font size=1 face="sans-serif">cc</font></div>
<td>
<tr valign=top>
<td>
<div align=right><font size=1 face="sans-serif">Subject</font></div>
<td><font size=1 face="sans-serif">Re: [mpich-discuss] Possible setup problem</font></table>
<br>
<table>
<tr valign=top>
<td>
<td></table>
<br></table>
<br>
<br>
<br><tt><font size=2><br>
Can you send us the command line you're using in both cases (where it works
and where it doesn't)?<br>
<br>
Thanks,<br>
-d<br>
<br>
On Apr 29, 2011, at 10:08 AM, Andy_Holland@URSCorp.com wrote:<br>
<br>
> <br>
> Darius, <br>
> There is quite a bit of output from the
program. When I pipe the standard output the actual program never
starts, MPICH messages just fill the screen and keep going and going. It
does work just fine if I redirect the standard output to a file. <br>
> <br>
> Andy Holland<br>
> Air Quality Modeler<br>
> URS Corporation<br>
> 1600 Perimeter Park Drive<br>
> Suite 400<br>
> Morrisville, NC 27560<br>
> Direct: (303) 796-4694<br>
> Cell: (919) 619-4218<br>
> Fax: (919) 461-1415<br>
> andy_holland@urscorp.com <br>
> <br>
> This e-mail and any attachments contain URS Corporation confidential
information that may be proprietary or privileged. If you receive this
message in error or are not the intended recipient, you should not retain,
distribute, disclose or use any of this information and you should destroy
the e-mail and any attachments or copies.<br>
> <br>
> <br>
> <br>
> <br>
> <br>
> <br>
> Darius Buntinas <buntinas@mcs.anl.gov> <br>
> Sent by: mpich-discuss-bounces@mcs.anl.gov<br>
> 04/29/2011 11:04 AM<br>
> Please respond to<br>
> mpich-discuss@mcs.anl.gov<br>
> <br>
> <br>
> To<br>
> Andy_Holland@URSCorp.com, mpich-discuss@mcs.anl.gov<br>
> cc<br>
> Subject<br>
> Re: [mpich-discuss] Possible setup problem<br>
> <br>
> <br>
> <br>
> <br>
> <br>
> [Re-adding mpich-discuss]<br>
> <br>
> Is there a lot of output (e.g., a few pages, or a few MBs)? The
process manager is not designed to handle a lot of stdin/out traffic. If
you have a lot of data it's better to write it directly to a file.<br>
> <br>
> I think you said this was a fortran program. I know there is
some trickiness with buffering I/O in fortran. How do you know the
program is hanging? Does the program not finish in the expected time,
or do you just not see any output in the redirected file when you expect
it. If it's the latter, it could be that the output is being buffered
in which case you might have to wait until the program terminates before
you see the output.<br>
> <br>
> -d<br>
> <br>
> On Apr 29, 2011, at 9:52 AM, Andy_Holland@URSCorp.com wrote:<br>
> <br>
> > <br>
> > The problem only occurs when I pipe the screen output to a log
file. If I don't do that, it runs fine. <br>
> > <br>
> > Andy Holland<br>
> > Air Quality Modeler<br>
> > URS Corporation<br>
> > 1600 Perimeter Park Drive<br>
> > Suite 400<br>
> > Morrisville, NC 27560<br>
> > Direct: (303) 796-4694<br>
> > Cell: (919) 619-4218<br>
> > Fax: (919) 461-1415<br>
> > andy_holland@urscorp.com <br>
> > <br>
> > This e-mail and any attachments contain URS Corporation confidential
information that may be proprietary or privileged. If you receive this
message in error or are not the intended recipient, you should not retain,
distribute, disclose or use any of this information and you should destroy
the e-mail and any attachments or copies.<br>
> > <br>
> > <br>
> > <br>
> > <br>
> > <br>
> > <br>
> > Darius Buntinas <buntinas@mcs.anl.gov><br>
> > 04/28/2011 05:15 PM <br>
> > <br>
> > To<br>
> > Andy_Holland@URSCorp.com<br>
> > cc<br>
> > Subject<br>
> > Re: [mpich-discuss] Possible setup problem<br>
> > <br>
> > <br>
> > <br>
> > <br>
> > <br>
> > <br>
> > It looks like the test program worked. <br>
> > <br>
> > Check whether your app works on one node. Also try other
applications to see if they work over two nodes.<br>
> > <br>
> > -d<br>
> > <br>
> > On Apr 28, 2011, at 3:38 PM, Andy_Holland@URSCorp.com wrote:<br>
> > <br>
> > > <br>
> > > We have modified some files on the machines and now when
I do 'host s051rhlapp01' it gives me the actual IP address of the machine.
I've attached the log file for your simple test after this correction.
I think it completed successfully, but wanted to check with you.
<br>
> > > <br>
> > > The model I'm trying to run using MPICH starts off fine
now, but then hangs at a certain point, not sure if this there is still
a problem or not. <br>
> > > <br>
> > > <br>
> > > <br>
> > > Thanks, <br>
> > > <br>
> > > Andy Holland<br>
> > > Air Quality Modeler<br>
> > > URS Corporation<br>
> > > 1600 Perimeter Park Drive<br>
> > > Suite 400<br>
> > > Morrisville, NC 27560<br>
> > > Direct: (303) 796-4694<br>
> > > Cell: (919) 619-4218<br>
> > > Fax: (919) 461-1415<br>
> > > andy_holland@urscorp.com <br>
> > > <br>
> > > This e-mail and any attachments contain URS Corporation
confidential information that may be proprietary or privileged. If you
receive this message in error or are not the intended recipient, you should
not retain, distribute, disclose or use any of this information and you
should destroy the e-mail and any attachments or copies.<br>
> > > <br>
> > > <br>
> > > <br>
> > > <br>
> > > <br>
> > > <br>
> > > Darius Buntinas <buntinas@mcs.anl.gov><br>
> > > 04/27/2011 05:13 PM <br>
> > > <br>
> > > To<br>
> > > Andy_Holland@URSCorp.com<br>
> > > cc<br>
> > > Subject<br>
> > > Re: [mpich-discuss] Possible setup problem<br>
> > > <br>
> > > <br>
> > > <br>
> > > <br>
> > > <br>
> > > <br>
> > > The problem is that machine A is unable to determine what
it's IP address is from it's hostname. So if you do a<br>
> > > hostname<br>
> > > from machine A, it should return A (or A.foo.com). Then
you should be able to do<br>
> > > host A <br>
> > > (or "host A.foo.com") and get the IP address of
the machine. It looks like your machines are returning the loopback
address. It's possible that you just need to make sure that the /etc/hosts
file on each machine has _its_own_ name in there (the one returned by hostname)
and that its set to the machine's actual IP address (and not 127.0.0.1).<br>
> > > <br>
> > > I'm not an expert in configuring networks, so I can't really
be more specific. Sorry.<br>
> > > <br>
> > > -d <br>
> > > <br>
> > > On Apr 27, 2011, at 4:06 PM, Andy_Holland@URSCorp.com wrote:<br>
> > > <br>
> > > > <br>
> > > > The /etc/hosts file only has the short names in it.
I'm not exactly sure what the networking issue is that I need to
let the sysadmin know about. Can you please explain it to me? <br>
> > > > <br>
> > > > Thanks, <br>
> > > > <br>
> > > > Andy Holland<br>
> > > > Air Quality Modeler<br>
> > > > URS Corporation<br>
> > > > 1600 Perimeter Park Drive<br>
> > > > Suite 400<br>
> > > > Morrisville, NC 27560<br>
> > > > Direct: (303) 796-4694<br>
> > > > Cell: (919) 619-4218<br>
> > > > Fax: (919) 461-1415<br>
> > > > andy_holland@urscorp.com <br>
> > > > <br>
> > > > This e-mail and any attachments contain URS Corporation
confidential information that may be proprietary or privileged. If you
receive this message in error or are not the intended recipient, you should
not retain, distribute, disclose or use any of this information and you
should destroy the e-mail and any attachments or copies.<br>
> > > > <br>
> > > > <br>
> > > > <br>
> > > > <br>
> > > > <br>
> > > > <br>
> > > > Darius Buntinas <buntinas@mcs.anl.gov><br>
> > > > 04/27/2011 04:53 PM <br>
> > > > <br>
> > > > To<br>
> > > > Andy_Holland@URSCorp.com<br>
> > > > cc<br>
> > > > Subject<br>
> > > > Re: [mpich-discuss] Possible setup problem<br>
> > > > <br>
> > > > <br>
> > > > <br>
> > > > <br>
> > > > <br>
> > > > <br>
> > > > How are the machines getting the IP address when using
the fill name? If they're in /etc/hosts, then I would go ahead and
add the short names there. Otherwise, while adding the short names
there will work, there's another network configuration problem that's causing
this and may give you trouble in the future, so it might be worth it to
find a sysadmin to help you (I'm lucky enough to have great sysadmins here,
so I don't (have to) know too much about configuring networking.).<br>
> > > > <br>
> > > > -d<br>
> > > > <br>
> > > > On Apr 27, 2011, at 3:46 PM, Andy_Holland@URSCorp.com
wrote:<br>
> > > > <br>
> > > > > <br>
> > > > > I just tried doing the host command with the full
name of the machine including the domain and it is returning the correct
IP address for each machine. The /etc/hosts files on the machines
do not include the domain in the machine name. Maybe they should?
<br>
> > > > > <br>
> > > > > Andy Holland<br>
> > > > > Air Quality Modeler<br>
> > > > > URS Corporation<br>
> > > > > 1600 Perimeter Park Drive<br>
> > > > > Suite 400<br>
> > > > > Morrisville, NC 27560<br>
> > > > > Direct: (303) 796-4694<br>
> > > > > Cell: (919) 619-4218<br>
> > > > > Fax: (919) 461-1415<br>
> > > > > andy_holland@urscorp.com <br>
> > > > > <br>
> > > > > This e-mail and any attachments contain URS Corporation
confidential information that may be proprietary or privileged. If you
receive this message in error or are not the intended recipient, you should
not retain, distribute, disclose or use any of this information and you
should destroy the e-mail and any attachments or copies.<br>
> > > > > <br>
> > > > > <br>
> > > > > <br>
> > > > > <br>
> > > > > <br>
> > > > > <br>
> > > > > Darius Buntinas <buntinas@mcs.anl.gov><br>
> > > > > 04/27/2011 02:58 PM <br>
> > > > > <br>
> > > > > To<br>
> > > > > Andy_Holland@URSCorp.com<br>
> > > > > cc<br>
> > > > > Subject<br>
> > > > > Re: [mpich-discuss] Possible setup problem<br>
> > > > > <br>
> > > > > <br>
> > > > > <br>
> > > > > <br>
> > > > > <br>
> > > > > <br>
> > > > > I think I found the problem. I should have
checked this earlier. It looks like your machines are set up to return
127.0.0.1 (the loopback address) when resolving their own hostname, rather
than their actual IP address.<br>
> > > > > <br>
> > > > > Try this on s051rhlapp01:<br>
> > > > > hostname<br>
> > > > > It should return s051rhlapp01. Then try:<br>
> > > > > host s051rhlapp01<br>
> > > > > It should NOT return 127.0.0.1. Then try
the same thing on s051rhlapp01 (using it's own name).<br>
> > > > > <br>
> > > > > If you don't get what you should, it indicates
a problem with your network configuration.<br>
> > > > > <br>
> > > > > -d<br>
> > > > > <br>
> > > > > On Apr 26, 2011, at 5:04 PM, Andy_Holland@URSCorp.com
wrote:<br>
> > > > > <br>
> > > > > > <br>
> > > > > > Here ya go. <br>
> > > > > > <br>
> > > > > > <br>
> > > > > > <br>
> > > > > > Andy Holland<br>
> > > > > > Air Quality Modeler<br>
> > > > > > URS Corporation<br>
> > > > > > 1600 Perimeter Park Drive<br>
> > > > > > Suite 400<br>
> > > > > > Morrisville, NC 27560<br>
> > > > > > Direct: (303) 796-4694<br>
> > > > > > Cell: (919) 619-4218<br>
> > > > > > Fax: (919) 461-1415<br>
> > > > > > andy_holland@urscorp.com <br>
> > > > > > <br>
> > > > > > This e-mail and any attachments contain URS
Corporation confidential information that may be proprietary or privileged.
If you receive this message in error or are not the intended recipient,
you should not retain, distribute, disclose or use any of this information
and you should destroy the e-mail and any attachments or copies.<br>
> > > > > > <br>
> > > > > > <br>
> > > > > > <br>
> > > > > > <br>
> > > > > > <br>
> > > > > > <br>
> > > > > > Darius Buntinas <buntinas@mcs.anl.gov><br>
> > > > > > 04/26/2011 05:56 PM <br>
> > > > > > <br>
> > > > > > To<br>
> > > > > > Andy_Holland@URSCorp.com<br>
> > > > > > cc<br>
> > > > > > Subject<br>
> > > > > > Re: [mpich-discuss] Possible setup problem<br>
> > > > > > <br>
> > > > > > <br>
> > > > > > <br>
> > > > > > <br>
> > > > > > <br>
> > > > > > Oops I forgot to mention that you need to
recompile the simple_test file:<br>
> > > > > > <br>
> > > > > > mpicc simple_test.c -o simple_test<br>
> > > > > > <br>
> > > > > > Can you try it again?<br>
> > > > > > <br>
> > > > > > Thanks,<br>
> > > > > > -d<br>
> > > > > > <br>
> > > > > > On Apr 26, 2011, at 3:45 PM, Andy_Holland@URSCorp.com
wrote:<br>
> > > > > > <br>
> > > > > > > <br>
> > > > > > > Ok, I applied the patch and rebuilt
both installations and reran your test program. Attached is the log
file. <br>
> > > > > > > <br>
> > > > > > > <br>
> > > > > > > <br>
> > > > > > > Thank you, <br>
> > > > > > > <br>
> > > > > > > Andy Holland<br>
> > > > > > > Air Quality Modeler<br>
> > > > > > > URS Corporation<br>
> > > > > > > 1600 Perimeter Park Drive<br>
> > > > > > > Suite 400<br>
> > > > > > > Morrisville, NC 27560<br>
> > > > > > > Direct: (303) 796-4694<br>
> > > > > > > Cell: (919) 619-4218<br>
> > > > > > > Fax: (919) 461-1415<br>
> > > > > > > andy_holland@urscorp.com <br>
> > > > > > > <br>
> > > > > > > This e-mail and any attachments contain
URS Corporation confidential information that may be proprietary or privileged.
If you receive this message in error or are not the intended recipient,
you should not retain, distribute, disclose or use any of this information
and you should destroy the e-mail and any attachments or copies.<br>
> > > > > > > <br>
> > > > > > > <br>
> > > > > > > <br>
> > > > > > > <br>
> > > > > > > <br>
> > > > > > > <br>
> > > > > > > Darius Buntinas <buntinas@mcs.anl.gov><br>
> > > > > > > 04/26/2011 02:20 PM <br>
> > > > > > > <br>
> > > > > > > To<br>
> > > > > > > Andy_Holland@URSCorp.com<br>
> > > > > > > cc<br>
> > > > > > > Subject<br>
> > > > > > > Re: [mpich-discuss] Possible setup problem<br>
> > > > > > > <br>
> > > > > > > <br>
> > > > > > > <br>
> > > > > > > <br>
> > > > > > > <br>
> > > > > > > Hmm. I found a bug with error
reporting. While this won't directly fix your problem, it may help
with identifying it.<br>
> > > > > > > <br>
> > > > > > > Can you apply this patch, then rebuild
and re-install mpich2 on both machines?<br>
> > > > > > > <br>
> > > > > > > (from the mpich2 source
directory)<br>
> > > > > > > patch -p0 < errno.patch<br>
> > > > > > > make clean<br>
> > > > > > > make<br>
> > > > > > > make install<br>
> > > > > > > <br>
> > > > > > > Then try the simple_test.c again and
send us the log.<br>
> > > > > > > <br>
> > > > > > > Thanks,<br>
> > > > > > > -d<br>
> > > > > > > <br>
> > > > > > > [attachment "errno.patch"
deleted by Andy Holland/Denver/URSCorp] <br>
> > > > > > > <br>
> > > > > > > On Apr 26, 2011, at 11:28 AM, Andy_Holland@URSCorp.com
wrote:<br>
> > > > > > > <br>
> > > > > > > > <br>
> > > > > > > > Ok, I turned iptables off on both
machines and reran it. Attached is the log file. <br>
> > > > > > > > <br>
> > > > > > > > <br>
> > > > > > > > <br>
> > > > > > > > Andy Holland<br>
> > > > > > > > Air Quality Modeler<br>
> > > > > > > > URS Corporation<br>
> > > > > > > > 1600 Perimeter Park Drive<br>
> > > > > > > > Suite 400<br>
> > > > > > > > Morrisville, NC 27560<br>
> > > > > > > > Direct: (303) 796-4694<br>
> > > > > > > > Cell: (919) 619-4218<br>
> > > > > > > > Fax: (919) 461-1415<br>
> > > > > > > > andy_holland@urscorp.com <br>
> > > > > > > > <br>
> > > > > > > > This e-mail and any attachments
contain URS Corporation confidential information that may be proprietary
or privileged. If you receive this message in error or are not the intended
recipient, you should not retain, distribute, disclose or use any of this
information and you should destroy the e-mail and any attachments or copies.<br>
> > > > > > > > <br>
> > > > > > > > <br>
> > > > > > > > <br>
> > > > > > > > <br>
> > > > > > > > <br>
> > > > > > > > <br>
> > > > > > > > Darius Buntinas <buntinas@mcs.anl.gov>
<br>
> > > > > > > > Sent by: mpich-discuss-bounces@mcs.anl.gov<br>
> > > > > > > > 04/26/2011 11:13 AM<br>
> > > > > > > > Please respond to<br>
> > > > > > > > mpich-discuss@mcs.anl.gov<br>
> > > > > > > > <br>
> > > > > > > > <br>
> > > > > > > > To<br>
> > > > > > > > mpich-discuss@mcs.anl.gov<br>
> > > > > > > > cc<br>
> > > > > > > > Subject<br>
> > > > > > > > Re: [mpich-discuss] Possible setup
problem<br>
> > > > > > > > <br>
> > > > > > > > <br>
> > > > > > > > <br>
> > > > > > > > <br>
> > > > > > > > <br>
> > > > > > > > <br>
> > > > > > > > For some reason, it's not showing
the specific socket error, but it's happening when a process on s051rhlapp02
tries to send a message to a process on s051rhlapp01. Can you try
disabling the firewalls on the machines and try it again?<br>
> > > > > > > > <br>
> > > > > > > > Thanks,<br>
> > > > > > > > -d<br>
> > > > > > > > <br>
> > > > > > > > On Apr 25, 2011, at 5:39 PM, Andy_Holland@URSCorp.com
wrote:<br>
> > > > > > > > <br>
> > > > > > > > > <br>
> > > > > > > > > Yeah, I put it in the wrong
directory. Ok, I reran in a shared area and I've attached the log
file. <br>
> > > > > > > > > <br>
> > > > > > > > > <br>
> > > > > > > > > <br>
> > > > > > > > > Thanks, <br>
> > > > > > > > > <br>
> > > > > > > > > Andy Holland<br>
> > > > > > > > > Air Quality Modeler<br>
> > > > > > > > > URS Corporation<br>
> > > > > > > > > 1600 Perimeter Park Drive<br>
> > > > > > > > > Suite 400<br>
> > > > > > > > > Morrisville, NC 27560<br>
> > > > > > > > > Direct: (303) 796-4694<br>
> > > > > > > > > Cell: (919) 619-4218<br>
> > > > > > > > > Fax: (919) 461-1415<br>
> > > > > > > > > andy_holland@urscorp.com <br>
> > > > > > > > > <br>
> > > > > > > > > This e-mail and any attachments
contain URS Corporation confidential information that may be proprietary
or privileged. If you receive this message in error or are not the intended
recipient, you should not retain, distribute, disclose or use any of this
information and you should destroy the e-mail and any attachments or copies.<br>
> > > > > > > > > <br>
> > > > > > > > > <br>
> > > > > > > > > <br>
> > > > > > > > > <br>
> > > > > > > > > <br>
> > > > > > > > > <br>
> > > > > > > > > Darius Buntinas <buntinas@mcs.anl.gov>
<br>
> > > > > > > > > Sent by: mpich-discuss-bounces@mcs.anl.gov<br>
> > > > > > > > > 04/25/2011 05:45 PM<br>
> > > > > > > > > Please respond to<br>
> > > > > > > > > mpich-discuss@mcs.anl.gov<br>
> > > > > > > > > <br>
> > > > > > > > > <br>
> > > > > > > > > To<br>
> > > > > > > > > mpich-discuss@mcs.anl.gov<br>
> > > > > > > > > cc<br>
> > > > > > > > > Subject<br>
> > > > > > > > > Re: [mpich-discuss] Possible
setup problem<br>
> > > > > > > > > <br>
> > > > > > > > > <br>
> > > > > > > > > <br>
> > > > > > > > > <br>
> > > > > > > > > <br>
> > > > > > > > > <br>
> > > > > > > > > Andy,<br>
> > > > > > > > > <br>
> > > > > > > > > Looking through the log file,
I see a line that says:<br>
> > > > > > > > > <br>
> > > > > > > > > [proxy:0:1@s051rhlapp02] launch_procs
(/usr/local/mpich2-1.3.2p1/src/pm/hydra/pm/pmiserv/pmip_cb.c:639): unable
to change wdir to /home/andy_holland/mpich_test (No such file or directory)<br>
> > > > > > > > > <br>
> > > > > > > > > Can you check that you can
access /home/andy_holland/mpich_test from s051rhlapp02 ?<br>
> > > > > > > > > <br>
> > > > > > > > > If not, put simple_test into
a directory that's accessible from both machines, and try it again.<br>
> > > > > > > > > <br>
> > > > > > > > > Thanks,<br>
> > > > > > > > > -d<br>
> > > > > > > > > <br>
> > > > > > > > > On Apr 25, 2011, at 3:55 PM,
Andy_Holland@URSCorp.com wrote:<br>
> > > > > > > > > <br>
> > > > > > > > > > <br>
> > > > > > > > > > Daruis, <br>
> > > > > > > > > >
Thanks. If I had just thought for a second longer I would
have had it. Attached is the log file for your test program. <br>
> > > > > > > > > > <br>
> > > > > > > > > > <br>
> > > > > > > > > > <br>
> > > > > > > > > > Andy Holland<br>
> > > > > > > > > > Air Quality Modeler<br>
> > > > > > > > > > URS Corporation<br>
> > > > > > > > > > 1600 Perimeter Park Drive<br>
> > > > > > > > > > Suite 400<br>
> > > > > > > > > > Morrisville, NC 27560<br>
> > > > > > > > > > Direct: (303) 796-4694<br>
> > > > > > > > > > Cell: (919) 619-4218<br>
> > > > > > > > > > Fax: (919) 461-1415<br>
> > > > > > > > > > andy_holland@urscorp.com
<br>
> > > > > > > > > > <br>
> > > > > > > > > > This e-mail and any attachments
contain URS Corporation confidential information that may be proprietary
or privileged. If you receive this message in error or are not the intended
recipient, you should not retain, distribute, disclose or use any of this
information and you should destroy the e-mail and any attachments or copies.<br>
> > > > > > > > > > <br>
> > > > > > > > > > <br>
> > > > > > > > > > <br>
> > > > > > > > > > <br>
> > > > > > > > > > <br>
> > > > > > > > > > <br>
> > > > > > > > > > Darius Buntinas <buntinas@mcs.anl.gov>
<br>
> > > > > > > > > > Sent by: mpich-discuss-bounces@mcs.anl.gov<br>
> > > > > > > > > > 04/25/2011 04:32 PM<br>
> > > > > > > > > > Please respond to<br>
> > > > > > > > > > mpich-discuss@mcs.anl.gov<br>
> > > > > > > > > > <br>
> > > > > > > > > > <br>
> > > > > > > > > > To<br>
> > > > > > > > > > mpich-discuss@mcs.anl.gov<br>
> > > > > > > > > > cc<br>
> > > > > > > > > > Subject<br>
> > > > > > > > > > Re: [mpich-discuss] Possible
setup problem<br>
> > > > > > > > > > <br>
> > > > > > > > > > <br>
> > > > > > > > > > <br>
> > > > > > > > > > <br>
> > > > > > > > > > <br>
> > > > > > > > > > <br>
> > > > > > > > > > Sorry. Just run:<br>
> > > > > > > > > > mpicc simple_test.c
-o simple_test<br>
> > > > > > > > > > <br>
> > > > > > > > > > If you needed to specify
the full path for mpiexec, use the same path for mpicc. This will
generate the executable called simple_test.<br>
> > > > > > > > > > <br>
> > > > > > > > > > -d<br>
> > > > > > > > > > <br>
> > > > > > > > > > <br>
> > > > > > > > > > On Apr 25, 2011, at 3:26
PM, Andy_Holland@URSCorp.com wrote:<br>
> > > > > > > > > > <br>
> > > > > > > > > > > <br>
> > > > > > > > > > > Darius, <br>
> > > > > > > > > > >
Thanks for your help with this. You'll have to forgive me
though, I'm a Fortran programmer and I'm not exactly sure how to compile
the program you sent me. I have gcc by the way. <br>
> > > > > > > > > > > <br>
> > > > > > > > > > > Thanks, <br>
> > > > > > > > > > > <br>
> > > > > > > > > > > Andy Holland<br>
> > > > > > > > > > > Air Quality Modeler<br>
> > > > > > > > > > > URS Corporation<br>
> > > > > > > > > > > 1600 Perimeter Park
Drive<br>
> > > > > > > > > > > Suite 400<br>
> > > > > > > > > > > Morrisville, NC
27560<br>
> > > > > > > > > > > Direct: (303) 796-4694<br>
> > > > > > > > > > > Cell: (919) 619-4218<br>
> > > > > > > > > > > Fax: (919) 461-1415<br>
> > > > > > > > > > > andy_holland@urscorp.com
<br>
> > > > > > > > > > > <br>
> > > > > > > > > > > This e-mail and
any attachments contain URS Corporation confidential information that may
be proprietary or privileged. If you receive this message in error or are
not the intended recipient, you should not retain, distribute, disclose
or use any of this information and you should destroy the e-mail and any
attachments or copies.<br>
> > > > > > > > > > > <br>
> > > > > > > > > > > <br>
> > > > > > > > > > > <br>
> > > > > > > > > > > <br>
> > > > > > > > > > > <br>
> > > > > > > > > > > <br>
> > > > > > > > > > > Darius Buntinas
<buntinas@mcs.anl.gov> <br>
> > > > > > > > > > > Sent by: mpich-discuss-bounces@mcs.anl.gov<br>
> > > > > > > > > > > 04/25/2011 03:19
PM<br>
> > > > > > > > > > > Please respond to<br>
> > > > > > > > > > > mpich-discuss@mcs.anl.gov<br>
> > > > > > > > > > > <br>
> > > > > > > > > > > <br>
> > > > > > > > > > > To<br>
> > > > > > > > > > > mpich-discuss@mcs.anl.gov<br>
> > > > > > > > > > > cc<br>
> > > > > > > > > > > Subject<br>
> > > > > > > > > > > Re: [mpich-discuss]
Possible setup problem<br>
> > > > > > > > > > > <br>
> > > > > > > > > > > <br>
> > > > > > > > > > > <br>
> > > > > > > > > > > <br>
> > > > > > > > > > > <br>
> > > > > > > > > > > <br>
> > > > > > > > > > > OK, can you try
the attached test program with the same number of processes and machine
file, but also add the -l option to mpiexec (to label the lines of output
with the rank).<br>
> > > > > > > > > > > <br>
> > > > > > > > > > > Thanks,<br>
> > > > > > > > > > > -d<br>
> > > > > > > > > > > <br>
> > > > > > > > > > > <br>
> > > > > > > > > > > [attachment "simple_test.c"
deleted by Andy Holland/Denver/URSCorp] <br>
> > > > > > > > > > > On Apr 25, 2011,
at 2:00 PM, Andy_Holland@URSCorp.com wrote:<br>
> > > > > > > > > > > <br>
> > > > > > > > > > > > <br>
> > > > > > > > > > > > I've attached
the log for running cpi using the same machinefile. <br>
> > > > > > > > > > > > <br>
> > > > > > > > > > > > <br>
> > > > > > > > > > > > <br>
> > > > > > > > > > > > Thank you,
<br>
> > > > > > > > > > > > <br>
> > > > > > > > > > > > Andy Holland<br>
> > > > > > > > > > > > Air Quality
Modeler<br>
> > > > > > > > > > > > URS Corporation<br>
> > > > > > > > > > > > 1600 Perimeter
Park Drive<br>
> > > > > > > > > > > > Suite 400<br>
> > > > > > > > > > > > Morrisville,
NC 27560<br>
> > > > > > > > > > > > Direct: (303)
796-4694<br>
> > > > > > > > > > > > Cell: (919)
619-4218<br>
> > > > > > > > > > > > Fax: (919)
461-1415<br>
> > > > > > > > > > > > andy_holland@urscorp.com
<br>
> > > > > > > > > > > > <br>
> > > > > > > > > > > > This e-mail
and any attachments contain URS Corporation confidential information that
may be proprietary or privileged. If you receive this message in error
or are not the intended recipient, you should not retain, distribute, disclose
or use any of this information and you should destroy the e-mail and any
attachments or copies.<br>
> > > > > > > > > > > > <br>
> > > > > > > > > > > > <br>
> > > > > > > > > > > > <br>
> > > > > > > > > > > > <br>
> > > > > > > > > > > > <br>
> > > > > > > > > > > > <br>
> > > > > > > > > > > > Darius Buntinas
<buntinas@mcs.anl.gov> <br>
> > > > > > > > > > > > Sent by: mpich-discuss-bounces@mcs.anl.gov<br>
> > > > > > > > > > > > 04/25/2011
02:51 PM<br>
> > > > > > > > > > > > Please respond
to<br>
> > > > > > > > > > > > mpich-discuss@mcs.anl.gov<br>
> > > > > > > > > > > > <br>
> > > > > > > > > > > > <br>
> > > > > > > > > > > > To<br>
> > > > > > > > > > > > mpich-discuss@mcs.anl.gov<br>
> > > > > > > > > > > > cc<br>
> > > > > > > > > > > > Subject<br>
> > > > > > > > > > > > Re: [mpich-discuss]
Possible setup problem<br>
> > > > > > > > > > > > <br>
> > > > > > > > > > > > <br>
> > > > > > > > > > > > <br>
> > > > > > > > > > > > <br>
> > > > > > > > > > > > </font></tt>
<br><tt><font size=2>> > > > > > > > > >
> > Hi Andy,<br>
> > > > > > > > > > > > <br>
> > > > > > > > > > > > Can you try
running cpi from the examples directory of the MPICH2 source tree with
the same number of processes and the same machine file? Let us know
if that works, and, if not, send us the output, please.<br>
> > > > > > > > > > > > <br>
> > > > > > > > > > > > Thanks,<br>
> > > > > > > > > > > > -d<br>
> > > > > > > > > > > > <br>
> > > > > > > > > > > > On Apr 25,
2011, at 1:30 PM, Andy_Holland@URSCorp.com wrote:<br>
> > > > > > > > > > > > <br>
> > > > > > > > > > > > > <br>
> > > > > > > > > > > > > It was
suggested that I send out all the error messages. I've attached a
log file from the model. <br>
> > > > > > > > > > > > > <br>
> > > > > > > > > > > > > <br>
> > > > > > > > > > > > > <br>
> > > > > > > > > > > > > Thank
you, <br>
> > > > > > > > > > > > > <br>
> > > > > > > > > > > > > Andy Holland<br>
> > > > > > > > > > > > > Air Quality
Modeler<br>
> > > > > > > > > > > > > URS Corporation<br>
> > > > > > > > > > > > > 1600 Perimeter
Park Drive<br>
> > > > > > > > > > > > > Suite
400<br>
> > > > > > > > > > > > > Morrisville,
NC 27560<br>
> > > > > > > > > > > > > Direct:
(303) 796-4694<br>
> > > > > > > > > > > > > Cell:
(919) 619-4218<br>
> > > > > > > > > > > > > Fax: (919)
461-1415<br>
> > > > > > > > > > > > > andy_holland@urscorp.com
<br>
> > > > > > > > > > > > > <br>
> > > > > > > > > > > > > This e-mail
and any attachments contain URS Corporation confidential information that
may be proprietary or privileged. If you receive this message in error
or are not the intended recipient, you should not retain, distribute, disclose
or use any of this information and you should destroy the e-mail and any
attachments or copies.<br>
> > > > > > > > > > > > > <br>
> > > > > > > > > > > > > <br>
> > > > > > > > > > > > > <br>
> > > > > > > > > > > > > <br>
> > > > > > > > > > > > > <br>
> > > > > > > > > > > > > <br>
> > > > > > > > > > > > > Dave Goodell
<goodell@mcs.anl.gov> <br>
> > > > > > > > > > > > > Sent by:
mpich-discuss-bounces@mcs.anl.gov<br>
> > > > > > > > > > > > > 04/25/2011
02:22 PM<br>
> > > > > > > > > > > > > Please
respond to<br>
> > > > > > > > > > > > > mpich-discuss@mcs.anl.gov<br>
> > > > > > > > > > > > > <br>
> > > > > > > > > > > > > <br>
> > > > > > > > > > > > > To<br>
> > > > > > > > > > > > > mpich-discuss@mcs.anl.gov<br>
> > > > > > > > > > > > > cc<br>
> > > > > > > > > > > > > Subject<br>
> > > > > > > > > > > > > Re: [mpich-discuss]
Possible setup problem<br>
> > > > > > > > > > > > > <br>
> > > > > > > > > > > > > <br>
> > > > > > > > > > > > > <br>
> > > > > > > > > > > > > <br>
> > > > > > > > > > > > > <br>
> > > > > > > > > > > > > On Apr
25, 2011, at 12:59 PM CDT, Andy_Holland@URSCorp.com wrote:<br>
> > > > > > > > > > > > > <br>
> > > > > > > > > > > > > > When
I run from either machine using CPUs from both machines the run stops with
many mpi messages. Below is the last message in the list: <br>
> > > > > > > > > > > > > > <br>
> > > > > > > > > > > > > > main
(/usr/local/mpich2-1.3.2p1/src/pm/hydra/ui/mpich/mpiexec.c:404): process
manager error waiting for completion <br>
> > > > > > > > > > > > > <br>
> > > > > > > > > > > > > Can you
send us all of the error messages? Typically the first error messages
are the most useful/relevant; the last ones often are just messages announcing
some sort of cleanup or secondary error caused by the original error.<br>
> > > > > > > > > > > > > <br>
> > > > > > > > > > > > > -Dave<br>
> > > > > > > > > > > > > <br>
> > > > > > > > > > > > > _______________________________________________<br>
> > > > > > > > > > > > > mpich-discuss
mailing list<br>
> > > > > > > > > > > > > mpich-discuss@mcs.anl.gov<br>
> > > > > > > > > > > > > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss<br>
> > > > > > > > > > > > > <br>
> > > > > > > > > > > > > <run.cctm.parallel.txt>_______________________________________________<br>
> > > > > > > > > > > > > mpich-discuss
mailing list<br>
> > > > > > > > > > > > > mpich-discuss@mcs.anl.gov<br>
> > > > > > > > > > > > > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss<br>
> > > > > > > > > > > > <br>
> > > > > > > > > > > > _______________________________________________<br>
> > > > > > > > > > > > mpich-discuss
mailing list<br>
> > > > > > > > > > > > mpich-discuss@mcs.anl.gov<br>
> > > > > > > > > > > > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss<br>
> > > > > > > > > > > > <br>
> > > > > > > > > > > > <cpi_log.txt>_______________________________________________<br>
> > > > > > > > > > > > mpich-discuss
mailing list<br>
> > > > > > > > > > > > mpich-discuss@mcs.anl.gov<br>
> > > > > > > > > > > > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss<br>
> > > > > > > > > > > <br>
> > > > > > > > > > > _______________________________________________<br>
> > > > > > > > > > > mpich-discuss mailing
list<br>
> > > > > > > > > > > mpich-discuss@mcs.anl.gov<br>
> > > > > > > > > > > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss<br>
> > > > > > > > > > > <br>
> > > > > > > > > > > _______________________________________________<br>
> > > > > > > > > > > mpich-discuss mailing
list<br>
> > > > > > > > > > > mpich-discuss@mcs.anl.gov<br>
> > > > > > > > > > > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss<br>
> > > > > > > > > > <br>
> > > > > > > > > > _______________________________________________<br>
> > > > > > > > > > mpich-discuss mailing
list<br>
> > > > > > > > > > mpich-discuss@mcs.anl.gov<br>
> > > > > > > > > > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss<br>
> > > > > > > > > > <br>
> > > > > > > > > > <simple_test_log.txt>_______________________________________________<br>
> > > > > > > > > > mpich-discuss mailing
list<br>
> > > > > > > > > > mpich-discuss@mcs.anl.gov<br>
> > > > > > > > > > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss<br>
> > > > > > > > > <br>
> > > > > > > > > _______________________________________________<br>
> > > > > > > > > mpich-discuss mailing list<br>
> > > > > > > > > mpich-discuss@mcs.anl.gov<br>
> > > > > > > > > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss<br>
> > > > > > > > > <br>
> > > > > > > > > <simple_test_log.txt>_______________________________________________<br>
> > > > > > > > > mpich-discuss mailing list<br>
> > > > > > > > > mpich-discuss@mcs.anl.gov<br>
> > > > > > > > > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss<br>
> > > > > > > > <br>
> > > > > > > > _______________________________________________<br>
> > > > > > > > mpich-discuss mailing list<br>
> > > > > > > > mpich-discuss@mcs.anl.gov<br>
> > > > > > > > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss<br>
> > > > > > > > <br>
> > > > > > > > <simple_test_log.txt><br>
> > > > > > > <br>
> > > > > > > <br>
> > > > > > > <simple_test_log.txt><br>
> > > > > > <br>
> > > > > > <br>
> > > > > > <simple_test_log.txt><br>
> > > > > <br>
> > > > > <br>
> > > > <br>
> > > > <br>
> > > <br>
> > > <br>
> > > <simple_test_log.txt><br>
> > <br>
> > <br>
> <br>
> _______________________________________________<br>
> mpich-discuss mailing list<br>
> mpich-discuss@mcs.anl.gov<br>
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss<br>
> <br>
> _______________________________________________<br>
> mpich-discuss mailing list<br>
> mpich-discuss@mcs.anl.gov<br>
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss<br>
<br>
_______________________________________________<br>
mpich-discuss mailing list<br>
mpich-discuss@mcs.anl.gov<br>
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss<br>
</font></tt>
<br>