[mpich-discuss] Possible setup problem

Andy_Holland at URSCorp.com Andy_Holland at URSCorp.com
Fri Apr 29 10:18:04 CDT 2011


This command doesn't work:

run.cctm |& tee run.cctm.log

This command does work:

run.cctm > run.cct.log

The run.cctm file is the run script.  This is the mpich command in that 
script:

time /usr/local/mpich2/bin/mpirun -v -machinefile machine8 -np 16 
$BASE/$EXEC

Andy Holland
Air Quality Modeler
URS Corporation
1600 Perimeter Park Drive
Suite 400
Morrisville, NC 27560
Direct: (303) 796-4694
Cell: (919) 619-4218
Fax: (919) 461-1415
andy_holland at urscorp.com


This e-mail and any attachments contain URS Corporation confidential 
information that may be proprietary or privileged. If you receive this 
message in error or are not the intended recipient, you should not retain, 
distribute, disclose or use any of this information and you should destroy 
the e-mail and any attachments or copies.






Darius Buntinas <buntinas at mcs.anl.gov> 
Sent by: mpich-discuss-bounces at mcs.anl.gov
04/29/2011 11:14 AM
Please respond to
mpich-discuss at mcs.anl.gov



To
mpich-discuss at mcs.anl.gov
cc

Subject
Re: [mpich-discuss] Possible setup problem







Can you send us the command line you're using in both cases (where it 
works and where it doesn't)?

Thanks,
-d

On Apr 29, 2011, at 10:08 AM, Andy_Holland at URSCorp.com wrote:

> 
> Darius, 
>         There is quite a bit of output from the program.  When I pipe 
the standard output the actual program never starts, MPICH messages just 
fill the screen and keep going and going.  It does work just fine if I 
redirect the standard output to a file. 
> 
> Andy Holland
> Air Quality Modeler
> URS Corporation
> 1600 Perimeter Park Drive
> Suite 400
> Morrisville, NC 27560
> Direct: (303) 796-4694
> Cell: (919) 619-4218
> Fax: (919) 461-1415
> andy_holland at urscorp.com 
> 
> This e-mail and any attachments contain URS Corporation confidential 
information that may be proprietary or privileged. If you receive this 
message in error or are not the intended recipient, you should not retain, 
distribute, disclose or use any of this information and you should destroy 
the e-mail and any attachments or copies.
> 
> 
> 
> 
> 
> 
> Darius Buntinas <buntinas at mcs.anl.gov> 
> Sent by: mpich-discuss-bounces at mcs.anl.gov
> 04/29/2011 11:04 AM
> Please respond to
> mpich-discuss at mcs.anl.gov
> 
> 
> To
> Andy_Holland at URSCorp.com, mpich-discuss at mcs.anl.gov
> cc
> Subject
> Re: [mpich-discuss] Possible setup problem
> 
> 
> 
> 
> 
> [Re-adding mpich-discuss]
> 
> Is there a lot of output (e.g., a few pages, or a few MBs)?  The process 
manager is not designed to handle a lot of stdin/out traffic.  If you have 
a lot of data it's better to write it directly to a file.
> 
> I think you said this was a fortran program.  I know there is some 
trickiness with buffering I/O in fortran.  How do you know the program is 
hanging?  Does the program not finish in the expected time, or do you just 
not see any output in the redirected file when you expect it.  If it's the 
latter, it could be that the output is being buffered in which case you 
might have to wait until the program terminates before you see the output.
> 
> -d
> 
> On Apr 29, 2011, at 9:52 AM, Andy_Holland at URSCorp.com wrote:
> 
> > 
> > The problem only occurs when I pipe the screen output to a log file. 
If I don't do that, it runs fine. 
> > 
> > Andy Holland
> > Air Quality Modeler
> > URS Corporation
> > 1600 Perimeter Park Drive
> > Suite 400
> > Morrisville, NC 27560
> > Direct: (303) 796-4694
> > Cell: (919) 619-4218
> > Fax: (919) 461-1415
> > andy_holland at urscorp.com 
> > 
> > This e-mail and any attachments contain URS Corporation confidential 
information that may be proprietary or privileged. If you receive this 
message in error or are not the intended recipient, you should not retain, 
distribute, disclose or use any of this information and you should destroy 
the e-mail and any attachments or copies.
> > 
> > 
> > 
> > 
> > 
> > 
> > Darius Buntinas <buntinas at mcs.anl.gov>
> > 04/28/2011 05:15 PM 
> > 
> > To
> > Andy_Holland at URSCorp.com
> > cc
> > Subject
> > Re: [mpich-discuss] Possible setup problem
> > 
> > 
> > 
> > 
> > 
> > 
> > It looks like the test program worked. 
> > 
> > Check whether your app works on one node.  Also try other applications 
to see if they work over two nodes.
> > 
> > -d
> > 
> > On Apr 28, 2011, at 3:38 PM, Andy_Holland at URSCorp.com wrote:
> > 
> > > 
> > > We have modified some files on the machines and now when I do 'host 
s051rhlapp01' it gives me the actual IP address of the machine.  I've 
attached the log file for your simple test after this correction.  I think 
it completed successfully, but wanted to check with you. 
> > > 
> > > The model I'm trying to run using MPICH starts off fine now, but 
then hangs at a certain point, not sure if this there is still a problem 
or not. 
> > > 
> > > 
> > > 
> > > Thanks, 
> > > 
> > > Andy Holland
> > > Air Quality Modeler
> > > URS Corporation
> > > 1600 Perimeter Park Drive
> > > Suite 400
> > > Morrisville, NC 27560
> > > Direct: (303) 796-4694
> > > Cell: (919) 619-4218
> > > Fax: (919) 461-1415
> > > andy_holland at urscorp.com 
> > > 
> > > This e-mail and any attachments contain URS Corporation confidential 
information that may be proprietary or privileged. If you receive this 
message in error or are not the intended recipient, you should not retain, 
distribute, disclose or use any of this information and you should destroy 
the e-mail and any attachments or copies.
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > Darius Buntinas <buntinas at mcs.anl.gov>
> > > 04/27/2011 05:13 PM 
> > > 
> > > To
> > > Andy_Holland at URSCorp.com
> > > cc
> > > Subject
> > > Re: [mpich-discuss] Possible setup problem
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > The problem is that machine A is unable to determine what it's IP 
address is from it's hostname.  So if you do a
> > >    hostname
> > > from machine A, it should return A (or A.foo.com).  Then you should 
be able to do
> > >    host A 
> > > (or "host A.foo.com") and get the IP address of the machine.  It 
looks like your machines are returning the loopback address.  It's 
possible that you just need to make sure that the /etc/hosts file on each 
machine has _its_own_ name in there (the one returned by hostname) and 
that its set to the machine's actual IP address (and not 127.0.0.1).
> > > 
> > > I'm not an expert in configuring networks, so I can't really be more 
specific.  Sorry.
> > > 
> > > -d 
> > > 
> > > On Apr 27, 2011, at 4:06 PM, Andy_Holland at URSCorp.com wrote:
> > > 
> > > > 
> > > > The /etc/hosts file only has the short names in it.  I'm not 
exactly sure what the networking issue is that I need to let the sysadmin 
know about.  Can you please explain it to me? 
> > > > 
> > > > Thanks, 
> > > > 
> > > > Andy Holland
> > > > Air Quality Modeler
> > > > URS Corporation
> > > > 1600 Perimeter Park Drive
> > > > Suite 400
> > > > Morrisville, NC 27560
> > > > Direct: (303) 796-4694
> > > > Cell: (919) 619-4218
> > > > Fax: (919) 461-1415
> > > > andy_holland at urscorp.com 
> > > > 
> > > > This e-mail and any attachments contain URS Corporation 
confidential information that may be proprietary or privileged. If you 
receive this message in error or are not the intended recipient, you 
should not retain, distribute, disclose or use any of this information and 
you should destroy the e-mail and any attachments or copies.
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > Darius Buntinas <buntinas at mcs.anl.gov>
> > > > 04/27/2011 04:53 PM 
> > > > 
> > > > To
> > > > Andy_Holland at URSCorp.com
> > > > cc
> > > > Subject
> > > > Re: [mpich-discuss] Possible setup problem
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > How are the machines getting the IP address when using the fill 
name?  If they're in /etc/hosts, then I would go ahead and add the short 
names there.  Otherwise, while adding the short names there will work, 
there's another network configuration problem that's causing this and may 
give you trouble in the future, so it might be worth it to find a sysadmin 
to help you (I'm lucky enough to have great sysadmins here, so I don't 
(have to) know too much about configuring networking.).
> > > > 
> > > > -d
> > > > 
> > > > On Apr 27, 2011, at 3:46 PM, Andy_Holland at URSCorp.com wrote:
> > > > 
> > > > > 
> > > > > I just tried doing the host command with the full name of the 
machine including the domain and it is returning the correct IP address 
for each machine.  The /etc/hosts files on the machines do not include the 
domain in the machine name.  Maybe they should? 
> > > > > 
> > > > > Andy Holland
> > > > > Air Quality Modeler
> > > > > URS Corporation
> > > > > 1600 Perimeter Park Drive
> > > > > Suite 400
> > > > > Morrisville, NC 27560
> > > > > Direct: (303) 796-4694
> > > > > Cell: (919) 619-4218
> > > > > Fax: (919) 461-1415
> > > > > andy_holland at urscorp.com 
> > > > > 
> > > > > This e-mail and any attachments contain URS Corporation 
confidential information that may be proprietary or privileged. If you 
receive this message in error or are not the intended recipient, you 
should not retain, distribute, disclose or use any of this information and 
you should destroy the e-mail and any attachments or copies.
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > Darius Buntinas <buntinas at mcs.anl.gov>
> > > > > 04/27/2011 02:58 PM 
> > > > > 
> > > > > To
> > > > > Andy_Holland at URSCorp.com
> > > > > cc
> > > > > Subject
> > > > > Re: [mpich-discuss] Possible setup problem
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > I think I found the problem.  I should have checked this 
earlier.  It looks like your machines are set up to return 127.0.0.1 (the 
loopback address) when resolving their own hostname, rather than their 
actual IP address.
> > > > > 
> > > > > Try this on s051rhlapp01:
> > > > >  hostname
> > > > > It should return s051rhlapp01.  Then try:
> > > > >  host s051rhlapp01
> > > > > It should NOT return 127.0.0.1.  Then try the same thing on 
s051rhlapp01 (using it's own name).
> > > > > 
> > > > > If you don't get what you should, it indicates a problem with 
your network configuration.
> > > > > 
> > > > > -d
> > > > > 
> > > > > On Apr 26, 2011, at 5:04 PM, Andy_Holland at URSCorp.com wrote:
> > > > > 
> > > > > > 
> > > > > > Here ya go. 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > Andy Holland
> > > > > > Air Quality Modeler
> > > > > > URS Corporation
> > > > > > 1600 Perimeter Park Drive
> > > > > > Suite 400
> > > > > > Morrisville, NC 27560
> > > > > > Direct: (303) 796-4694
> > > > > > Cell: (919) 619-4218
> > > > > > Fax: (919) 461-1415
> > > > > > andy_holland at urscorp.com 
> > > > > > 
> > > > > > This e-mail and any attachments contain URS Corporation 
confidential information that may be proprietary or privileged. If you 
receive this message in error or are not the intended recipient, you 
should not retain, distribute, disclose or use any of this information and 
you should destroy the e-mail and any attachments or copies.
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > Darius Buntinas <buntinas at mcs.anl.gov>
> > > > > > 04/26/2011 05:56 PM 
> > > > > > 
> > > > > > To
> > > > > > Andy_Holland at URSCorp.com
> > > > > > cc
> > > > > > Subject
> > > > > > Re: [mpich-discuss] Possible setup problem
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > Oops I forgot to mention that you need to recompile the 
simple_test file:
> > > > > > 
> > > > > >  mpicc simple_test.c -o simple_test
> > > > > > 
> > > > > > Can you try it again?
> > > > > > 
> > > > > > Thanks,
> > > > > > -d
> > > > > > 
> > > > > > On Apr 26, 2011, at 3:45 PM, Andy_Holland at URSCorp.com wrote:
> > > > > > 
> > > > > > > 
> > > > > > > Ok, I applied the patch and rebuilt both installations and 
reran your test program.  Attached is the log file. 
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > Thank you, 
> > > > > > > 
> > > > > > > Andy Holland
> > > > > > > Air Quality Modeler
> > > > > > > URS Corporation
> > > > > > > 1600 Perimeter Park Drive
> > > > > > > Suite 400
> > > > > > > Morrisville, NC 27560
> > > > > > > Direct: (303) 796-4694
> > > > > > > Cell: (919) 619-4218
> > > > > > > Fax: (919) 461-1415
> > > > > > > andy_holland at urscorp.com 
> > > > > > > 
> > > > > > > This e-mail and any attachments contain URS Corporation 
confidential information that may be proprietary or privileged. If you 
receive this message in error or are not the intended recipient, you 
should not retain, distribute, disclose or use any of this information and 
you should destroy the e-mail and any attachments or copies.
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > Darius Buntinas <buntinas at mcs.anl.gov>
> > > > > > > 04/26/2011 02:20 PM 
> > > > > > > 
> > > > > > > To
> > > > > > > Andy_Holland at URSCorp.com
> > > > > > > cc
> > > > > > > Subject
> > > > > > > Re: [mpich-discuss] Possible setup problem
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > Hmm.  I found a bug with error reporting.  While this won't 
directly fix your problem, it may help with identifying it.
> > > > > > > 
> > > > > > > Can you apply this patch, then rebuild and re-install mpich2 
on both machines?
> > > > > > > 
> > > > > > >    (from the mpich2 source directory)
> > > > > > >    patch -p0 < errno.patch
> > > > > > >    make clean
> > > > > > >    make
> > > > > > >    make install
> > > > > > > 
> > > > > > > Then try the simple_test.c again and send us the log.
> > > > > > > 
> > > > > > > Thanks,
> > > > > > > -d
> > > > > > > 
> > > > > > > [attachment "errno.patch" deleted by Andy 
Holland/Denver/URSCorp] 
> > > > > > > 
> > > > > > > On Apr 26, 2011, at 11:28 AM, Andy_Holland at URSCorp.com 
wrote:
> > > > > > > 
> > > > > > > > 
> > > > > > > > Ok, I turned iptables off on both machines and reran it. 
Attached is the log file. 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > Andy Holland
> > > > > > > > Air Quality Modeler
> > > > > > > > URS Corporation
> > > > > > > > 1600 Perimeter Park Drive
> > > > > > > > Suite 400
> > > > > > > > Morrisville, NC 27560
> > > > > > > > Direct: (303) 796-4694
> > > > > > > > Cell: (919) 619-4218
> > > > > > > > Fax: (919) 461-1415
> > > > > > > > andy_holland at urscorp.com 
> > > > > > > > 
> > > > > > > > This e-mail and any attachments contain URS Corporation 
confidential information that may be proprietary or privileged. If you 
receive this message in error or are not the intended recipient, you 
should not retain, distribute, disclose or use any of this information and 
you should destroy the e-mail and any attachments or copies.
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > Darius Buntinas <buntinas at mcs.anl.gov> 
> > > > > > > > Sent by: mpich-discuss-bounces at mcs.anl.gov
> > > > > > > > 04/26/2011 11:13 AM
> > > > > > > > Please respond to
> > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > 
> > > > > > > > 
> > > > > > > > To
> > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > cc
> > > > > > > > Subject
> > > > > > > > Re: [mpich-discuss] Possible setup problem
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > For some reason, it's not showing the specific socket 
error, but it's happening when a process on s051rhlapp02 tries to send a 
message to a process on s051rhlapp01.  Can you try disabling the firewalls 
on the machines and try it again?
> > > > > > > > 
> > > > > > > > Thanks,
> > > > > > > > -d
> > > > > > > > 
> > > > > > > > On Apr 25, 2011, at 5:39 PM, Andy_Holland at URSCorp.com 
wrote:
> > > > > > > > 
> > > > > > > > > 
> > > > > > > > > Yeah, I put it in the wrong directory.  Ok, I reran in a 
shared area and I've attached the log file. 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > Thanks, 
> > > > > > > > > 
> > > > > > > > > Andy Holland
> > > > > > > > > Air Quality Modeler
> > > > > > > > > URS Corporation
> > > > > > > > > 1600 Perimeter Park Drive
> > > > > > > > > Suite 400
> > > > > > > > > Morrisville, NC 27560
> > > > > > > > > Direct: (303) 796-4694
> > > > > > > > > Cell: (919) 619-4218
> > > > > > > > > Fax: (919) 461-1415
> > > > > > > > > andy_holland at urscorp.com 
> > > > > > > > > 
> > > > > > > > > This e-mail and any attachments contain URS Corporation 
confidential information that may be proprietary or privileged. If you 
receive this message in error or are not the intended recipient, you 
should not retain, distribute, disclose or use any of this information and 
you should destroy the e-mail and any attachments or copies.
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > Darius Buntinas <buntinas at mcs.anl.gov> 
> > > > > > > > > Sent by: mpich-discuss-bounces at mcs.anl.gov
> > > > > > > > > 04/25/2011 05:45 PM
> > > > > > > > > Please respond to
> > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > To
> > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > cc
> > > > > > > > > Subject
> > > > > > > > > Re: [mpich-discuss] Possible setup problem
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > Andy,
> > > > > > > > > 
> > > > > > > > > Looking through the log file, I see a line that says:
> > > > > > > > > 
> > > > > > > > > [proxy:0:1 at s051rhlapp02] launch_procs 
(/usr/local/mpich2-1.3.2p1/src/pm/hydra/pm/pmiserv/pmip_cb.c:639): unable 
to change wdir to /home/andy_holland/mpich_test (No such file or 
directory)
> > > > > > > > > 
> > > > > > > > > Can you check that you can access 
/home/andy_holland/mpich_test from s051rhlapp02 ?
> > > > > > > > > 
> > > > > > > > > If not, put simple_test into a directory that's 
accessible from both machines, and try it again.
> > > > > > > > > 
> > > > > > > > > Thanks,
> > > > > > > > > -d
> > > > > > > > > 
> > > > > > > > > On Apr 25, 2011, at 3:55 PM, Andy_Holland at URSCorp.com 
wrote:
> > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > Daruis, 
> > > > > > > > > >         Thanks.  If I had just thought for a second 
longer I would have had it.  Attached is the log file for your test 
program. 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > Andy Holland
> > > > > > > > > > Air Quality Modeler
> > > > > > > > > > URS Corporation
> > > > > > > > > > 1600 Perimeter Park Drive
> > > > > > > > > > Suite 400
> > > > > > > > > > Morrisville, NC 27560
> > > > > > > > > > Direct: (303) 796-4694
> > > > > > > > > > Cell: (919) 619-4218
> > > > > > > > > > Fax: (919) 461-1415
> > > > > > > > > > andy_holland at urscorp.com 
> > > > > > > > > > 
> > > > > > > > > > This e-mail and any attachments contain URS 
Corporation confidential information that may be proprietary or 
privileged. If you receive this message in error or are not the intended 
recipient, you should not retain, distribute, disclose or use any of this 
information and you should destroy the e-mail and any attachments or 
copies.
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > Darius Buntinas <buntinas at mcs.anl.gov> 
> > > > > > > > > > Sent by: mpich-discuss-bounces at mcs.anl.gov
> > > > > > > > > > 04/25/2011 04:32 PM
> > > > > > > > > > Please respond to
> > > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > To
> > > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > > cc
> > > > > > > > > > Subject
> > > > > > > > > > Re: [mpich-discuss] Possible setup problem
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > Sorry.  Just run:
> > > > > > > > > >    mpicc simple_test.c -o simple_test
> > > > > > > > > > 
> > > > > > > > > > If you needed to specify the full path for mpiexec, 
use the same path for mpicc.  This will generate the executable called 
simple_test.
> > > > > > > > > > 
> > > > > > > > > > -d
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > On Apr 25, 2011, at 3:26 PM, Andy_Holland at URSCorp.com 
wrote:
> > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > Darius, 
> > > > > > > > > > >         Thanks for your help with this.  You'll have 
to forgive me though, I'm a Fortran programmer and I'm not exactly sure 
how to compile the program you sent me.  I have gcc by the way. 
> > > > > > > > > > > 
> > > > > > > > > > > Thanks, 
> > > > > > > > > > > 
> > > > > > > > > > > Andy Holland
> > > > > > > > > > > Air Quality Modeler
> > > > > > > > > > > URS Corporation
> > > > > > > > > > > 1600 Perimeter Park Drive
> > > > > > > > > > > Suite 400
> > > > > > > > > > > Morrisville, NC 27560
> > > > > > > > > > > Direct: (303) 796-4694
> > > > > > > > > > > Cell: (919) 619-4218
> > > > > > > > > > > Fax: (919) 461-1415
> > > > > > > > > > > andy_holland at urscorp.com 
> > > > > > > > > > > 
> > > > > > > > > > > This e-mail and any attachments contain URS 
Corporation confidential information that may be proprietary or 
privileged. If you receive this message in error or are not the intended 
recipient, you should not retain, distribute, disclose or use any of this 
information and you should destroy the e-mail and any attachments or 
copies.
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > Darius Buntinas <buntinas at mcs.anl.gov> 
> > > > > > > > > > > Sent by: mpich-discuss-bounces at mcs.anl.gov
> > > > > > > > > > > 04/25/2011 03:19 PM
> > > > > > > > > > > Please respond to
> > > > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > To
> > > > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > > > cc
> > > > > > > > > > > Subject
> > > > > > > > > > > Re: [mpich-discuss] Possible setup problem
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > OK, can you try the attached test program with the 
same number of processes and machine file, but also add the -l option to 
mpiexec (to label the lines of output with the rank).
> > > > > > > > > > > 
> > > > > > > > > > > Thanks,
> > > > > > > > > > > -d
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > [attachment "simple_test.c" deleted by Andy 
Holland/Denver/URSCorp] 
> > > > > > > > > > > On Apr 25, 2011, at 2:00 PM, 
Andy_Holland at URSCorp.com wrote:
> > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > I've attached the log for running cpi using the 
same machinefile. 
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > Thank you, 
> > > > > > > > > > > > 
> > > > > > > > > > > > Andy Holland
> > > > > > > > > > > > Air Quality Modeler
> > > > > > > > > > > > URS Corporation
> > > > > > > > > > > > 1600 Perimeter Park Drive
> > > > > > > > > > > > Suite 400
> > > > > > > > > > > > Morrisville, NC 27560
> > > > > > > > > > > > Direct: (303) 796-4694
> > > > > > > > > > > > Cell: (919) 619-4218
> > > > > > > > > > > > Fax: (919) 461-1415
> > > > > > > > > > > > andy_holland at urscorp.com 
> > > > > > > > > > > > 
> > > > > > > > > > > > This e-mail and any attachments contain URS 
Corporation confidential information that may be proprietary or 
privileged. If you receive this message in error or are not the intended 
recipient, you should not retain, distribute, disclose or use any of this 
information and you should destroy the e-mail and any attachments or 
copies.
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > Darius Buntinas <buntinas at mcs.anl.gov> 
> > > > > > > > > > > > Sent by: mpich-discuss-bounces at mcs.anl.gov
> > > > > > > > > > > > 04/25/2011 02:51 PM
> > > > > > > > > > > > Please respond to
> > > > > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > To
> > > > > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > > > > cc
> > > > > > > > > > > > Subject
> > > > > > > > > > > > Re: [mpich-discuss] Possible setup problem
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > Hi Andy,
> > > > > > > > > > > > 
> > > > > > > > > > > > Can you try running cpi from the examples 
directory of the MPICH2 source tree with the same number of processes and 
the same machine file?  Let us know if that works, and, if not, send us 
the output, please.
> > > > > > > > > > > > 
> > > > > > > > > > > > Thanks,
> > > > > > > > > > > > -d
> > > > > > > > > > > > 
> > > > > > > > > > > > On Apr 25, 2011, at 1:30 PM, 
Andy_Holland at URSCorp.com wrote:
> > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > It was suggested that I send out all the error 
messages.  I've attached a log file from the model. 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Thank you, 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Andy Holland
> > > > > > > > > > > > > Air Quality Modeler
> > > > > > > > > > > > > URS Corporation
> > > > > > > > > > > > > 1600 Perimeter Park Drive
> > > > > > > > > > > > > Suite 400
> > > > > > > > > > > > > Morrisville, NC 27560
> > > > > > > > > > > > > Direct: (303) 796-4694
> > > > > > > > > > > > > Cell: (919) 619-4218
> > > > > > > > > > > > > Fax: (919) 461-1415
> > > > > > > > > > > > > andy_holland at urscorp.com 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > This e-mail and any attachments contain URS 
Corporation confidential information that may be proprietary or 
privileged. If you receive this message in error or are not the intended 
recipient, you should not retain, distribute, disclose or use any of this 
information and you should destroy the e-mail and any attachments or 
copies.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Dave Goodell <goodell at mcs.anl.gov> 
> > > > > > > > > > > > > Sent by: mpich-discuss-bounces at mcs.anl.gov
> > > > > > > > > > > > > 04/25/2011 02:22 PM
> > > > > > > > > > > > > Please respond to
> > > > > > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > To
> > > > > > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > > > > > cc
> > > > > > > > > > > > > Subject
> > > > > > > > > > > > > Re: [mpich-discuss] Possible setup problem
> > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > On Apr 25, 2011, at 12:59 PM CDT, 
Andy_Holland at URSCorp.com wrote:
> > > > > > > > > > > > > 
> > > > > > > > > > > > > > When I run from either machine using CPUs from 
both machines the run stops with many mpi messages.  Below is the last 
message in the list: 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > main 
(/usr/local/mpich2-1.3.2p1/src/pm/hydra/ui/mpich/mpiexec.c:404): process 
manager error waiting for completion 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Can you send us all of the error messages? 
Typically the first error messages are the most useful/relevant; the last 
ones often are just messages announcing some sort of cleanup or secondary 
error caused by the original error.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > -Dave
> > > > > > > > > > > > > 
> > > > > > > > > > > > > _______________________________________________
> > > > > > > > > > > > > mpich-discuss mailing list
> > > > > > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > > > > > 
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> > > > > > > > > > > > > 
> > > > > > > > > > > > > 
<run.cctm.parallel.txt>_______________________________________________
> > > > > > > > > > > > > mpich-discuss mailing list
> > > > > > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > > > > > 
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> > > > > > > > > > > > 
> > > > > > > > > > > > _______________________________________________
> > > > > > > > > > > > mpich-discuss mailing list
> > > > > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > > > > 
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> > > > > > > > > > > > 
> > > > > > > > > > > > 
<cpi_log.txt>_______________________________________________
> > > > > > > > > > > > mpich-discuss mailing list
> > > > > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > > > > 
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> > > > > > > > > > > 
> > > > > > > > > > > _______________________________________________
> > > > > > > > > > > mpich-discuss mailing list
> > > > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > > > 
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> > > > > > > > > > > 
> > > > > > > > > > > _______________________________________________
> > > > > > > > > > > mpich-discuss mailing list
> > > > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > > > 
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> > > > > > > > > > 
> > > > > > > > > > _______________________________________________
> > > > > > > > > > mpich-discuss mailing list
> > > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > > 
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> > > > > > > > > > 
> > > > > > > > > > 
<simple_test_log.txt>_______________________________________________
> > > > > > > > > > mpich-discuss mailing list
> > > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > > 
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> > > > > > > > > 
> > > > > > > > > _______________________________________________
> > > > > > > > > mpich-discuss mailing list
> > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> > > > > > > > > 
> > > > > > > > > 
<simple_test_log.txt>_______________________________________________
> > > > > > > > > mpich-discuss mailing list
> > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> > > > > > > > 
> > > > > > > > _______________________________________________
> > > > > > > > mpich-discuss mailing list
> > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> > > > > > > > 
> > > > > > > > <simple_test_log.txt>
> > > > > > > 
> > > > > > > 
> > > > > > > <simple_test_log.txt>
> > > > > > 
> > > > > > 
> > > > > > <simple_test_log.txt>
> > > > > 
> > > > > 
> > > > 
> > > > 
> > > 
> > > 
> > > <simple_test_log.txt>
> > 
> > 
> 
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> 
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss

_______________________________________________
mpich-discuss mailing list
mpich-discuss at mcs.anl.gov
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20110429/2fdcdcd6/attachment-0001.htm>


More information about the mpich-discuss mailing list