[mpich-discuss] Possible setup problem
Andy_Holland at URSCorp.com
Andy_Holland at URSCorp.com
Fri Apr 29 10:18:04 CDT 2011
This command doesn't work:
run.cctm |& tee run.cctm.log
This command does work:
run.cctm > run.cct.log
The run.cctm file is the run script. This is the mpich command in that
script:
time /usr/local/mpich2/bin/mpirun -v -machinefile machine8 -np 16
$BASE/$EXEC
Andy Holland
Air Quality Modeler
URS Corporation
1600 Perimeter Park Drive
Suite 400
Morrisville, NC 27560
Direct: (303) 796-4694
Cell: (919) 619-4218
Fax: (919) 461-1415
andy_holland at urscorp.com
This e-mail and any attachments contain URS Corporation confidential
information that may be proprietary or privileged. If you receive this
message in error or are not the intended recipient, you should not retain,
distribute, disclose or use any of this information and you should destroy
the e-mail and any attachments or copies.
Darius Buntinas <buntinas at mcs.anl.gov>
Sent by: mpich-discuss-bounces at mcs.anl.gov
04/29/2011 11:14 AM
Please respond to
mpich-discuss at mcs.anl.gov
To
mpich-discuss at mcs.anl.gov
cc
Subject
Re: [mpich-discuss] Possible setup problem
Can you send us the command line you're using in both cases (where it
works and where it doesn't)?
Thanks,
-d
On Apr 29, 2011, at 10:08 AM, Andy_Holland at URSCorp.com wrote:
>
> Darius,
> There is quite a bit of output from the program. When I pipe
the standard output the actual program never starts, MPICH messages just
fill the screen and keep going and going. It does work just fine if I
redirect the standard output to a file.
>
> Andy Holland
> Air Quality Modeler
> URS Corporation
> 1600 Perimeter Park Drive
> Suite 400
> Morrisville, NC 27560
> Direct: (303) 796-4694
> Cell: (919) 619-4218
> Fax: (919) 461-1415
> andy_holland at urscorp.com
>
> This e-mail and any attachments contain URS Corporation confidential
information that may be proprietary or privileged. If you receive this
message in error or are not the intended recipient, you should not retain,
distribute, disclose or use any of this information and you should destroy
the e-mail and any attachments or copies.
>
>
>
>
>
>
> Darius Buntinas <buntinas at mcs.anl.gov>
> Sent by: mpich-discuss-bounces at mcs.anl.gov
> 04/29/2011 11:04 AM
> Please respond to
> mpich-discuss at mcs.anl.gov
>
>
> To
> Andy_Holland at URSCorp.com, mpich-discuss at mcs.anl.gov
> cc
> Subject
> Re: [mpich-discuss] Possible setup problem
>
>
>
>
>
> [Re-adding mpich-discuss]
>
> Is there a lot of output (e.g., a few pages, or a few MBs)? The process
manager is not designed to handle a lot of stdin/out traffic. If you have
a lot of data it's better to write it directly to a file.
>
> I think you said this was a fortran program. I know there is some
trickiness with buffering I/O in fortran. How do you know the program is
hanging? Does the program not finish in the expected time, or do you just
not see any output in the redirected file when you expect it. If it's the
latter, it could be that the output is being buffered in which case you
might have to wait until the program terminates before you see the output.
>
> -d
>
> On Apr 29, 2011, at 9:52 AM, Andy_Holland at URSCorp.com wrote:
>
> >
> > The problem only occurs when I pipe the screen output to a log file.
If I don't do that, it runs fine.
> >
> > Andy Holland
> > Air Quality Modeler
> > URS Corporation
> > 1600 Perimeter Park Drive
> > Suite 400
> > Morrisville, NC 27560
> > Direct: (303) 796-4694
> > Cell: (919) 619-4218
> > Fax: (919) 461-1415
> > andy_holland at urscorp.com
> >
> > This e-mail and any attachments contain URS Corporation confidential
information that may be proprietary or privileged. If you receive this
message in error or are not the intended recipient, you should not retain,
distribute, disclose or use any of this information and you should destroy
the e-mail and any attachments or copies.
> >
> >
> >
> >
> >
> >
> > Darius Buntinas <buntinas at mcs.anl.gov>
> > 04/28/2011 05:15 PM
> >
> > To
> > Andy_Holland at URSCorp.com
> > cc
> > Subject
> > Re: [mpich-discuss] Possible setup problem
> >
> >
> >
> >
> >
> >
> > It looks like the test program worked.
> >
> > Check whether your app works on one node. Also try other applications
to see if they work over two nodes.
> >
> > -d
> >
> > On Apr 28, 2011, at 3:38 PM, Andy_Holland at URSCorp.com wrote:
> >
> > >
> > > We have modified some files on the machines and now when I do 'host
s051rhlapp01' it gives me the actual IP address of the machine. I've
attached the log file for your simple test after this correction. I think
it completed successfully, but wanted to check with you.
> > >
> > > The model I'm trying to run using MPICH starts off fine now, but
then hangs at a certain point, not sure if this there is still a problem
or not.
> > >
> > >
> > >
> > > Thanks,
> > >
> > > Andy Holland
> > > Air Quality Modeler
> > > URS Corporation
> > > 1600 Perimeter Park Drive
> > > Suite 400
> > > Morrisville, NC 27560
> > > Direct: (303) 796-4694
> > > Cell: (919) 619-4218
> > > Fax: (919) 461-1415
> > > andy_holland at urscorp.com
> > >
> > > This e-mail and any attachments contain URS Corporation confidential
information that may be proprietary or privileged. If you receive this
message in error or are not the intended recipient, you should not retain,
distribute, disclose or use any of this information and you should destroy
the e-mail and any attachments or copies.
> > >
> > >
> > >
> > >
> > >
> > >
> > > Darius Buntinas <buntinas at mcs.anl.gov>
> > > 04/27/2011 05:13 PM
> > >
> > > To
> > > Andy_Holland at URSCorp.com
> > > cc
> > > Subject
> > > Re: [mpich-discuss] Possible setup problem
> > >
> > >
> > >
> > >
> > >
> > >
> > > The problem is that machine A is unable to determine what it's IP
address is from it's hostname. So if you do a
> > > hostname
> > > from machine A, it should return A (or A.foo.com). Then you should
be able to do
> > > host A
> > > (or "host A.foo.com") and get the IP address of the machine. It
looks like your machines are returning the loopback address. It's
possible that you just need to make sure that the /etc/hosts file on each
machine has _its_own_ name in there (the one returned by hostname) and
that its set to the machine's actual IP address (and not 127.0.0.1).
> > >
> > > I'm not an expert in configuring networks, so I can't really be more
specific. Sorry.
> > >
> > > -d
> > >
> > > On Apr 27, 2011, at 4:06 PM, Andy_Holland at URSCorp.com wrote:
> > >
> > > >
> > > > The /etc/hosts file only has the short names in it. I'm not
exactly sure what the networking issue is that I need to let the sysadmin
know about. Can you please explain it to me?
> > > >
> > > > Thanks,
> > > >
> > > > Andy Holland
> > > > Air Quality Modeler
> > > > URS Corporation
> > > > 1600 Perimeter Park Drive
> > > > Suite 400
> > > > Morrisville, NC 27560
> > > > Direct: (303) 796-4694
> > > > Cell: (919) 619-4218
> > > > Fax: (919) 461-1415
> > > > andy_holland at urscorp.com
> > > >
> > > > This e-mail and any attachments contain URS Corporation
confidential information that may be proprietary or privileged. If you
receive this message in error or are not the intended recipient, you
should not retain, distribute, disclose or use any of this information and
you should destroy the e-mail and any attachments or copies.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > Darius Buntinas <buntinas at mcs.anl.gov>
> > > > 04/27/2011 04:53 PM
> > > >
> > > > To
> > > > Andy_Holland at URSCorp.com
> > > > cc
> > > > Subject
> > > > Re: [mpich-discuss] Possible setup problem
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > How are the machines getting the IP address when using the fill
name? If they're in /etc/hosts, then I would go ahead and add the short
names there. Otherwise, while adding the short names there will work,
there's another network configuration problem that's causing this and may
give you trouble in the future, so it might be worth it to find a sysadmin
to help you (I'm lucky enough to have great sysadmins here, so I don't
(have to) know too much about configuring networking.).
> > > >
> > > > -d
> > > >
> > > > On Apr 27, 2011, at 3:46 PM, Andy_Holland at URSCorp.com wrote:
> > > >
> > > > >
> > > > > I just tried doing the host command with the full name of the
machine including the domain and it is returning the correct IP address
for each machine. The /etc/hosts files on the machines do not include the
domain in the machine name. Maybe they should?
> > > > >
> > > > > Andy Holland
> > > > > Air Quality Modeler
> > > > > URS Corporation
> > > > > 1600 Perimeter Park Drive
> > > > > Suite 400
> > > > > Morrisville, NC 27560
> > > > > Direct: (303) 796-4694
> > > > > Cell: (919) 619-4218
> > > > > Fax: (919) 461-1415
> > > > > andy_holland at urscorp.com
> > > > >
> > > > > This e-mail and any attachments contain URS Corporation
confidential information that may be proprietary or privileged. If you
receive this message in error or are not the intended recipient, you
should not retain, distribute, disclose or use any of this information and
you should destroy the e-mail and any attachments or copies.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > Darius Buntinas <buntinas at mcs.anl.gov>
> > > > > 04/27/2011 02:58 PM
> > > > >
> > > > > To
> > > > > Andy_Holland at URSCorp.com
> > > > > cc
> > > > > Subject
> > > > > Re: [mpich-discuss] Possible setup problem
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > I think I found the problem. I should have checked this
earlier. It looks like your machines are set up to return 127.0.0.1 (the
loopback address) when resolving their own hostname, rather than their
actual IP address.
> > > > >
> > > > > Try this on s051rhlapp01:
> > > > > hostname
> > > > > It should return s051rhlapp01. Then try:
> > > > > host s051rhlapp01
> > > > > It should NOT return 127.0.0.1. Then try the same thing on
s051rhlapp01 (using it's own name).
> > > > >
> > > > > If you don't get what you should, it indicates a problem with
your network configuration.
> > > > >
> > > > > -d
> > > > >
> > > > > On Apr 26, 2011, at 5:04 PM, Andy_Holland at URSCorp.com wrote:
> > > > >
> > > > > >
> > > > > > Here ya go.
> > > > > >
> > > > > >
> > > > > >
> > > > > > Andy Holland
> > > > > > Air Quality Modeler
> > > > > > URS Corporation
> > > > > > 1600 Perimeter Park Drive
> > > > > > Suite 400
> > > > > > Morrisville, NC 27560
> > > > > > Direct: (303) 796-4694
> > > > > > Cell: (919) 619-4218
> > > > > > Fax: (919) 461-1415
> > > > > > andy_holland at urscorp.com
> > > > > >
> > > > > > This e-mail and any attachments contain URS Corporation
confidential information that may be proprietary or privileged. If you
receive this message in error or are not the intended recipient, you
should not retain, distribute, disclose or use any of this information and
you should destroy the e-mail and any attachments or copies.
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > Darius Buntinas <buntinas at mcs.anl.gov>
> > > > > > 04/26/2011 05:56 PM
> > > > > >
> > > > > > To
> > > > > > Andy_Holland at URSCorp.com
> > > > > > cc
> > > > > > Subject
> > > > > > Re: [mpich-discuss] Possible setup problem
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > Oops I forgot to mention that you need to recompile the
simple_test file:
> > > > > >
> > > > > > mpicc simple_test.c -o simple_test
> > > > > >
> > > > > > Can you try it again?
> > > > > >
> > > > > > Thanks,
> > > > > > -d
> > > > > >
> > > > > > On Apr 26, 2011, at 3:45 PM, Andy_Holland at URSCorp.com wrote:
> > > > > >
> > > > > > >
> > > > > > > Ok, I applied the patch and rebuilt both installations and
reran your test program. Attached is the log file.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Thank you,
> > > > > > >
> > > > > > > Andy Holland
> > > > > > > Air Quality Modeler
> > > > > > > URS Corporation
> > > > > > > 1600 Perimeter Park Drive
> > > > > > > Suite 400
> > > > > > > Morrisville, NC 27560
> > > > > > > Direct: (303) 796-4694
> > > > > > > Cell: (919) 619-4218
> > > > > > > Fax: (919) 461-1415
> > > > > > > andy_holland at urscorp.com
> > > > > > >
> > > > > > > This e-mail and any attachments contain URS Corporation
confidential information that may be proprietary or privileged. If you
receive this message in error or are not the intended recipient, you
should not retain, distribute, disclose or use any of this information and
you should destroy the e-mail and any attachments or copies.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Darius Buntinas <buntinas at mcs.anl.gov>
> > > > > > > 04/26/2011 02:20 PM
> > > > > > >
> > > > > > > To
> > > > > > > Andy_Holland at URSCorp.com
> > > > > > > cc
> > > > > > > Subject
> > > > > > > Re: [mpich-discuss] Possible setup problem
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Hmm. I found a bug with error reporting. While this won't
directly fix your problem, it may help with identifying it.
> > > > > > >
> > > > > > > Can you apply this patch, then rebuild and re-install mpich2
on both machines?
> > > > > > >
> > > > > > > (from the mpich2 source directory)
> > > > > > > patch -p0 < errno.patch
> > > > > > > make clean
> > > > > > > make
> > > > > > > make install
> > > > > > >
> > > > > > > Then try the simple_test.c again and send us the log.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > -d
> > > > > > >
> > > > > > > [attachment "errno.patch" deleted by Andy
Holland/Denver/URSCorp]
> > > > > > >
> > > > > > > On Apr 26, 2011, at 11:28 AM, Andy_Holland at URSCorp.com
wrote:
> > > > > > >
> > > > > > > >
> > > > > > > > Ok, I turned iptables off on both machines and reran it.
Attached is the log file.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Andy Holland
> > > > > > > > Air Quality Modeler
> > > > > > > > URS Corporation
> > > > > > > > 1600 Perimeter Park Drive
> > > > > > > > Suite 400
> > > > > > > > Morrisville, NC 27560
> > > > > > > > Direct: (303) 796-4694
> > > > > > > > Cell: (919) 619-4218
> > > > > > > > Fax: (919) 461-1415
> > > > > > > > andy_holland at urscorp.com
> > > > > > > >
> > > > > > > > This e-mail and any attachments contain URS Corporation
confidential information that may be proprietary or privileged. If you
receive this message in error or are not the intended recipient, you
should not retain, distribute, disclose or use any of this information and
you should destroy the e-mail and any attachments or copies.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Darius Buntinas <buntinas at mcs.anl.gov>
> > > > > > > > Sent by: mpich-discuss-bounces at mcs.anl.gov
> > > > > > > > 04/26/2011 11:13 AM
> > > > > > > > Please respond to
> > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > >
> > > > > > > >
> > > > > > > > To
> > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > cc
> > > > > > > > Subject
> > > > > > > > Re: [mpich-discuss] Possible setup problem
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > For some reason, it's not showing the specific socket
error, but it's happening when a process on s051rhlapp02 tries to send a
message to a process on s051rhlapp01. Can you try disabling the firewalls
on the machines and try it again?
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > -d
> > > > > > > >
> > > > > > > > On Apr 25, 2011, at 5:39 PM, Andy_Holland at URSCorp.com
wrote:
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Yeah, I put it in the wrong directory. Ok, I reran in a
shared area and I've attached the log file.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > >
> > > > > > > > > Andy Holland
> > > > > > > > > Air Quality Modeler
> > > > > > > > > URS Corporation
> > > > > > > > > 1600 Perimeter Park Drive
> > > > > > > > > Suite 400
> > > > > > > > > Morrisville, NC 27560
> > > > > > > > > Direct: (303) 796-4694
> > > > > > > > > Cell: (919) 619-4218
> > > > > > > > > Fax: (919) 461-1415
> > > > > > > > > andy_holland at urscorp.com
> > > > > > > > >
> > > > > > > > > This e-mail and any attachments contain URS Corporation
confidential information that may be proprietary or privileged. If you
receive this message in error or are not the intended recipient, you
should not retain, distribute, disclose or use any of this information and
you should destroy the e-mail and any attachments or copies.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Darius Buntinas <buntinas at mcs.anl.gov>
> > > > > > > > > Sent by: mpich-discuss-bounces at mcs.anl.gov
> > > > > > > > > 04/25/2011 05:45 PM
> > > > > > > > > Please respond to
> > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > To
> > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > cc
> > > > > > > > > Subject
> > > > > > > > > Re: [mpich-discuss] Possible setup problem
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Andy,
> > > > > > > > >
> > > > > > > > > Looking through the log file, I see a line that says:
> > > > > > > > >
> > > > > > > > > [proxy:0:1 at s051rhlapp02] launch_procs
(/usr/local/mpich2-1.3.2p1/src/pm/hydra/pm/pmiserv/pmip_cb.c:639): unable
to change wdir to /home/andy_holland/mpich_test (No such file or
directory)
> > > > > > > > >
> > > > > > > > > Can you check that you can access
/home/andy_holland/mpich_test from s051rhlapp02 ?
> > > > > > > > >
> > > > > > > > > If not, put simple_test into a directory that's
accessible from both machines, and try it again.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > -d
> > > > > > > > >
> > > > > > > > > On Apr 25, 2011, at 3:55 PM, Andy_Holland at URSCorp.com
wrote:
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Daruis,
> > > > > > > > > > Thanks. If I had just thought for a second
longer I would have had it. Attached is the log file for your test
program.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Andy Holland
> > > > > > > > > > Air Quality Modeler
> > > > > > > > > > URS Corporation
> > > > > > > > > > 1600 Perimeter Park Drive
> > > > > > > > > > Suite 400
> > > > > > > > > > Morrisville, NC 27560
> > > > > > > > > > Direct: (303) 796-4694
> > > > > > > > > > Cell: (919) 619-4218
> > > > > > > > > > Fax: (919) 461-1415
> > > > > > > > > > andy_holland at urscorp.com
> > > > > > > > > >
> > > > > > > > > > This e-mail and any attachments contain URS
Corporation confidential information that may be proprietary or
privileged. If you receive this message in error or are not the intended
recipient, you should not retain, distribute, disclose or use any of this
information and you should destroy the e-mail and any attachments or
copies.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Darius Buntinas <buntinas at mcs.anl.gov>
> > > > > > > > > > Sent by: mpich-discuss-bounces at mcs.anl.gov
> > > > > > > > > > 04/25/2011 04:32 PM
> > > > > > > > > > Please respond to
> > > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > To
> > > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > > cc
> > > > > > > > > > Subject
> > > > > > > > > > Re: [mpich-discuss] Possible setup problem
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Sorry. Just run:
> > > > > > > > > > mpicc simple_test.c -o simple_test
> > > > > > > > > >
> > > > > > > > > > If you needed to specify the full path for mpiexec,
use the same path for mpicc. This will generate the executable called
simple_test.
> > > > > > > > > >
> > > > > > > > > > -d
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Apr 25, 2011, at 3:26 PM, Andy_Holland at URSCorp.com
wrote:
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Darius,
> > > > > > > > > > > Thanks for your help with this. You'll have
to forgive me though, I'm a Fortran programmer and I'm not exactly sure
how to compile the program you sent me. I have gcc by the way.
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > >
> > > > > > > > > > > Andy Holland
> > > > > > > > > > > Air Quality Modeler
> > > > > > > > > > > URS Corporation
> > > > > > > > > > > 1600 Perimeter Park Drive
> > > > > > > > > > > Suite 400
> > > > > > > > > > > Morrisville, NC 27560
> > > > > > > > > > > Direct: (303) 796-4694
> > > > > > > > > > > Cell: (919) 619-4218
> > > > > > > > > > > Fax: (919) 461-1415
> > > > > > > > > > > andy_holland at urscorp.com
> > > > > > > > > > >
> > > > > > > > > > > This e-mail and any attachments contain URS
Corporation confidential information that may be proprietary or
privileged. If you receive this message in error or are not the intended
recipient, you should not retain, distribute, disclose or use any of this
information and you should destroy the e-mail and any attachments or
copies.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Darius Buntinas <buntinas at mcs.anl.gov>
> > > > > > > > > > > Sent by: mpich-discuss-bounces at mcs.anl.gov
> > > > > > > > > > > 04/25/2011 03:19 PM
> > > > > > > > > > > Please respond to
> > > > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > To
> > > > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > > > cc
> > > > > > > > > > > Subject
> > > > > > > > > > > Re: [mpich-discuss] Possible setup problem
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > OK, can you try the attached test program with the
same number of processes and machine file, but also add the -l option to
mpiexec (to label the lines of output with the rank).
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > > -d
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > [attachment "simple_test.c" deleted by Andy
Holland/Denver/URSCorp]
> > > > > > > > > > > On Apr 25, 2011, at 2:00 PM,
Andy_Holland at URSCorp.com wrote:
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > I've attached the log for running cpi using the
same machinefile.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Thank you,
> > > > > > > > > > > >
> > > > > > > > > > > > Andy Holland
> > > > > > > > > > > > Air Quality Modeler
> > > > > > > > > > > > URS Corporation
> > > > > > > > > > > > 1600 Perimeter Park Drive
> > > > > > > > > > > > Suite 400
> > > > > > > > > > > > Morrisville, NC 27560
> > > > > > > > > > > > Direct: (303) 796-4694
> > > > > > > > > > > > Cell: (919) 619-4218
> > > > > > > > > > > > Fax: (919) 461-1415
> > > > > > > > > > > > andy_holland at urscorp.com
> > > > > > > > > > > >
> > > > > > > > > > > > This e-mail and any attachments contain URS
Corporation confidential information that may be proprietary or
privileged. If you receive this message in error or are not the intended
recipient, you should not retain, distribute, disclose or use any of this
information and you should destroy the e-mail and any attachments or
copies.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Darius Buntinas <buntinas at mcs.anl.gov>
> > > > > > > > > > > > Sent by: mpich-discuss-bounces at mcs.anl.gov
> > > > > > > > > > > > 04/25/2011 02:51 PM
> > > > > > > > > > > > Please respond to
> > > > > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > To
> > > > > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > > > > cc
> > > > > > > > > > > > Subject
> > > > > > > > > > > > Re: [mpich-discuss] Possible setup problem
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Hi Andy,
> > > > > > > > > > > >
> > > > > > > > > > > > Can you try running cpi from the examples
directory of the MPICH2 source tree with the same number of processes and
the same machine file? Let us know if that works, and, if not, send us
the output, please.
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks,
> > > > > > > > > > > > -d
> > > > > > > > > > > >
> > > > > > > > > > > > On Apr 25, 2011, at 1:30 PM,
Andy_Holland at URSCorp.com wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > It was suggested that I send out all the error
messages. I've attached a log file from the model.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thank you,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Andy Holland
> > > > > > > > > > > > > Air Quality Modeler
> > > > > > > > > > > > > URS Corporation
> > > > > > > > > > > > > 1600 Perimeter Park Drive
> > > > > > > > > > > > > Suite 400
> > > > > > > > > > > > > Morrisville, NC 27560
> > > > > > > > > > > > > Direct: (303) 796-4694
> > > > > > > > > > > > > Cell: (919) 619-4218
> > > > > > > > > > > > > Fax: (919) 461-1415
> > > > > > > > > > > > > andy_holland at urscorp.com
> > > > > > > > > > > > >
> > > > > > > > > > > > > This e-mail and any attachments contain URS
Corporation confidential information that may be proprietary or
privileged. If you receive this message in error or are not the intended
recipient, you should not retain, distribute, disclose or use any of this
information and you should destroy the e-mail and any attachments or
copies.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Dave Goodell <goodell at mcs.anl.gov>
> > > > > > > > > > > > > Sent by: mpich-discuss-bounces at mcs.anl.gov
> > > > > > > > > > > > > 04/25/2011 02:22 PM
> > > > > > > > > > > > > Please respond to
> > > > > > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > To
> > > > > > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > > > > > cc
> > > > > > > > > > > > > Subject
> > > > > > > > > > > > > Re: [mpich-discuss] Possible setup problem
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Apr 25, 2011, at 12:59 PM CDT,
Andy_Holland at URSCorp.com wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > When I run from either machine using CPUs from
both machines the run stops with many mpi messages. Below is the last
message in the list:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > main
(/usr/local/mpich2-1.3.2p1/src/pm/hydra/ui/mpich/mpiexec.c:404): process
manager error waiting for completion
> > > > > > > > > > > > >
> > > > > > > > > > > > > Can you send us all of the error messages?
Typically the first error messages are the most useful/relevant; the last
ones often are just messages announcing some sort of cleanup or secondary
error caused by the original error.
> > > > > > > > > > > > >
> > > > > > > > > > > > > -Dave
> > > > > > > > > > > > >
> > > > > > > > > > > > > _______________________________________________
> > > > > > > > > > > > > mpich-discuss mailing list
> > > > > > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > > > > >
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> > > > > > > > > > > > >
> > > > > > > > > > > > >
<run.cctm.parallel.txt>_______________________________________________
> > > > > > > > > > > > > mpich-discuss mailing list
> > > > > > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > > > > >
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> > > > > > > > > > > >
> > > > > > > > > > > > _______________________________________________
> > > > > > > > > > > > mpich-discuss mailing list
> > > > > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > > > >
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> > > > > > > > > > > >
> > > > > > > > > > > >
<cpi_log.txt>_______________________________________________
> > > > > > > > > > > > mpich-discuss mailing list
> > > > > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > > > >
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> > > > > > > > > > >
> > > > > > > > > > > _______________________________________________
> > > > > > > > > > > mpich-discuss mailing list
> > > > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > > >
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> > > > > > > > > > >
> > > > > > > > > > > _______________________________________________
> > > > > > > > > > > mpich-discuss mailing list
> > > > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > > >
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> > > > > > > > > >
> > > > > > > > > > _______________________________________________
> > > > > > > > > > mpich-discuss mailing list
> > > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > >
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> > > > > > > > > >
> > > > > > > > > >
<simple_test_log.txt>_______________________________________________
> > > > > > > > > > mpich-discuss mailing list
> > > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > >
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> > > > > > > > >
> > > > > > > > > _______________________________________________
> > > > > > > > > mpich-discuss mailing list
> > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> > > > > > > > >
> > > > > > > > >
<simple_test_log.txt>_______________________________________________
> > > > > > > > > mpich-discuss mailing list
> > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> > > > > > > >
> > > > > > > > _______________________________________________
> > > > > > > > mpich-discuss mailing list
> > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> > > > > > > >
> > > > > > > > <simple_test_log.txt>
> > > > > > >
> > > > > > >
> > > > > > > <simple_test_log.txt>
> > > > > >
> > > > > >
> > > > > > <simple_test_log.txt>
> > > > >
> > > > >
> > > >
> > > >
> > >
> > >
> > > <simple_test_log.txt>
> >
> >
>
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
_______________________________________________
mpich-discuss mailing list
mpich-discuss at mcs.anl.gov
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20110429/2fdcdcd6/attachment-0001.htm>
More information about the mpich-discuss
mailing list