[mpich-discuss] Possible setup problem

Darius Buntinas buntinas at mcs.anl.gov
Fri Apr 29 10:27:06 CDT 2011


tee does it's own buffering as well.  Are you sure the program hangs?  Did you wait long enough for it to finish running?

I recently had a similar issue with a long running script.  I thought it hung, but tee just buffered the output till the program terminated, so I didn't see any output on the screen or in the file till the end.

-d

On Apr 29, 2011, at 10:18 AM, Andy_Holland at URSCorp.com wrote:

> 
> This command doesn't work: 
> 
> run.cctm |& tee run.cctm.log 
> 
> This command does work: 
> 
> run.cctm > run.cct.log 
> 
> The run.cctm file is the run script.  This is the mpich command in that script: 
> 
> time /usr/local/mpich2/bin/mpirun -v -machinefile machine8 -np 16 $BASE/$EXEC 
> 
> Andy Holland
> Air Quality Modeler
> URS Corporation
> 1600 Perimeter Park Drive
> Suite 400
> Morrisville, NC 27560
> Direct: (303) 796-4694
> Cell: (919) 619-4218
> Fax: (919) 461-1415
> andy_holland at urscorp.com 
> 
> This e-mail and any attachments contain URS Corporation confidential information that may be proprietary or privileged. If you receive this message in error or are not the intended recipient, you should not retain, distribute, disclose or use any of this information and you should destroy the e-mail and any attachments or copies.
> 
> 
> 
> 
> 
> 
> Darius Buntinas <buntinas at mcs.anl.gov> 
> Sent by: mpich-discuss-bounces at mcs.anl.gov
> 04/29/2011 11:14 AM
> Please respond to
> mpich-discuss at mcs.anl.gov
> 
> 
> To
> mpich-discuss at mcs.anl.gov
> cc
> Subject
> Re: [mpich-discuss] Possible setup problem
> 
> 
> 
> 
> 
> 
> Can you send us the command line you're using in both cases (where it works and where it doesn't)?
> 
> Thanks,
> -d
> 
> On Apr 29, 2011, at 10:08 AM, Andy_Holland at URSCorp.com wrote:
> 
> > 
> > Darius, 
> >         There is quite a bit of output from the program.  When I pipe the standard output the actual program never starts, MPICH messages just fill the screen and keep going and going.  It does work just fine if I redirect the standard output to a file. 
> > 
> > Andy Holland
> > Air Quality Modeler
> > URS Corporation
> > 1600 Perimeter Park Drive
> > Suite 400
> > Morrisville, NC 27560
> > Direct: (303) 796-4694
> > Cell: (919) 619-4218
> > Fax: (919) 461-1415
> > andy_holland at urscorp.com 
> > 
> > This e-mail and any attachments contain URS Corporation confidential information that may be proprietary or privileged. If you receive this message in error or are not the intended recipient, you should not retain, distribute, disclose or use any of this information and you should destroy the e-mail and any attachments or copies.
> > 
> > 
> > 
> > 
> > 
> > 
> > Darius Buntinas <buntinas at mcs.anl.gov> 
> > Sent by: mpich-discuss-bounces at mcs.anl.gov
> > 04/29/2011 11:04 AM
> > Please respond to
> > mpich-discuss at mcs.anl.gov
> > 
> > 
> > To
> > Andy_Holland at URSCorp.com, mpich-discuss at mcs.anl.gov
> > cc
> > Subject
> > Re: [mpich-discuss] Possible setup problem
> > 
> > 
> > 
> > 
> > 
> > [Re-adding mpich-discuss]
> > 
> > Is there a lot of output (e.g., a few pages, or a few MBs)?  The process manager is not designed to handle a lot of stdin/out traffic.  If you have a lot of data it's better to write it directly to a file.
> > 
> > I think you said this was a fortran program.  I know there is some trickiness with buffering I/O in fortran.  How do you know the program is hanging?  Does the program not finish in the expected time, or do you just not see any output in the redirected file when you expect it.  If it's the latter, it could be that the output is being buffered in which case you might have to wait until the program terminates before you see the output.
> > 
> > -d
> > 
> > On Apr 29, 2011, at 9:52 AM, Andy_Holland at URSCorp.com wrote:
> > 
> > > 
> > > The problem only occurs when I pipe the screen output to a log file.  If I don't do that, it runs fine. 
> > > 
> > > Andy Holland
> > > Air Quality Modeler
> > > URS Corporation
> > > 1600 Perimeter Park Drive
> > > Suite 400
> > > Morrisville, NC 27560
> > > Direct: (303) 796-4694
> > > Cell: (919) 619-4218
> > > Fax: (919) 461-1415
> > > andy_holland at urscorp.com 
> > > 
> > > This e-mail and any attachments contain URS Corporation confidential information that may be proprietary or privileged. If you receive this message in error or are not the intended recipient, you should not retain, distribute, disclose or use any of this information and you should destroy the e-mail and any attachments or copies.
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > Darius Buntinas <buntinas at mcs.anl.gov>
> > > 04/28/2011 05:15 PM 
> > > 
> > > To
> > > Andy_Holland at URSCorp.com
> > > cc
> > > Subject
> > > Re: [mpich-discuss] Possible setup problem
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > It looks like the test program worked.  
> > > 
> > > Check whether your app works on one node.  Also try other applications to see if they work over two nodes.
> > > 
> > > -d
> > > 
> > > On Apr 28, 2011, at 3:38 PM, Andy_Holland at URSCorp.com wrote:
> > > 
> > > > 
> > > > We have modified some files on the machines and now when I do 'host s051rhlapp01' it gives me the actual IP address of the machine.  I've attached the log file for your simple test after this correction.  I think it completed successfully, but wanted to check with you. 
> > > > 
> > > > The model I'm trying to run using MPICH starts off fine now, but then hangs at a certain point, not sure if this there is still a problem or not. 
> > > > 
> > > > 
> > > > 
> > > > Thanks, 
> > > > 
> > > > Andy Holland
> > > > Air Quality Modeler
> > > > URS Corporation
> > > > 1600 Perimeter Park Drive
> > > > Suite 400
> > > > Morrisville, NC 27560
> > > > Direct: (303) 796-4694
> > > > Cell: (919) 619-4218
> > > > Fax: (919) 461-1415
> > > > andy_holland at urscorp.com 
> > > > 
> > > > This e-mail and any attachments contain URS Corporation confidential information that may be proprietary or privileged. If you receive this message in error or are not the intended recipient, you should not retain, distribute, disclose or use any of this information and you should destroy the e-mail and any attachments or copies.
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > Darius Buntinas <buntinas at mcs.anl.gov>
> > > > 04/27/2011 05:13 PM 
> > > > 
> > > > To
> > > > Andy_Holland at URSCorp.com
> > > > cc
> > > > Subject
> > > > Re: [mpich-discuss] Possible setup problem
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > The problem is that machine A is unable to determine what it's IP address is from it's hostname.  So if you do a
> > > >    hostname
> > > > from machine A, it should return A (or A.foo.com).  Then you should be able to do
> > > >    host A 
> > > > (or "host A.foo.com") and get the IP address of the machine.  It looks like your machines are returning the loopback address.  It's possible that you just need to make sure that the /etc/hosts file on each machine has _its_own_ name in there (the one returned by hostname) and that its set to the machine's actual IP address (and not 127.0.0.1).
> > > > 
> > > > I'm not an expert in configuring networks, so I can't really be more specific.  Sorry.
> > > > 
> > > > -d 
> > > > 
> > > > On Apr 27, 2011, at 4:06 PM, Andy_Holland at URSCorp.com wrote:
> > > > 
> > > > > 
> > > > > The /etc/hosts file only has the short names in it.  I'm not exactly sure what the networking issue is that I need to let the sysadmin know about.  Can you please explain it to me? 
> > > > > 
> > > > > Thanks, 
> > > > > 
> > > > > Andy Holland
> > > > > Air Quality Modeler
> > > > > URS Corporation
> > > > > 1600 Perimeter Park Drive
> > > > > Suite 400
> > > > > Morrisville, NC 27560
> > > > > Direct: (303) 796-4694
> > > > > Cell: (919) 619-4218
> > > > > Fax: (919) 461-1415
> > > > > andy_holland at urscorp.com 
> > > > > 
> > > > > This e-mail and any attachments contain URS Corporation confidential information that may be proprietary or privileged. If you receive this message in error or are not the intended recipient, you should not retain, distribute, disclose or use any of this information and you should destroy the e-mail and any attachments or copies.
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > Darius Buntinas <buntinas at mcs.anl.gov>
> > > > > 04/27/2011 04:53 PM 
> > > > > 
> > > > > To
> > > > > Andy_Holland at URSCorp.com
> > > > > cc
> > > > > Subject
> > > > > Re: [mpich-discuss] Possible setup problem
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > How are the machines getting the IP address when using the fill name?  If they're in /etc/hosts, then I would go ahead and add the short names there.  Otherwise, while adding the short names there will work, there's another network configuration problem that's causing this and may give you trouble in the future, so it might be worth it to find a sysadmin to help you (I'm lucky enough to have great sysadmins here, so I don't (have to) know too much about configuring networking.).
> > > > > 
> > > > > -d
> > > > > 
> > > > > On Apr 27, 2011, at 3:46 PM, Andy_Holland at URSCorp.com wrote:
> > > > > 
> > > > > > 
> > > > > > I just tried doing the host command with the full name of the machine including the domain and it is returning the correct IP address for each machine.  The /etc/hosts files on the machines do not include the domain in the machine name.  Maybe they should? 
> > > > > > 
> > > > > > Andy Holland
> > > > > > Air Quality Modeler
> > > > > > URS Corporation
> > > > > > 1600 Perimeter Park Drive
> > > > > > Suite 400
> > > > > > Morrisville, NC 27560
> > > > > > Direct: (303) 796-4694
> > > > > > Cell: (919) 619-4218
> > > > > > Fax: (919) 461-1415
> > > > > > andy_holland at urscorp.com 
> > > > > > 
> > > > > > This e-mail and any attachments contain URS Corporation confidential information that may be proprietary or privileged. If you receive this message in error or are not the intended recipient, you should not retain, distribute, disclose or use any of this information and you should destroy the e-mail and any attachments or copies.
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > Darius Buntinas <buntinas at mcs.anl.gov>
> > > > > > 04/27/2011 02:58 PM 
> > > > > > 
> > > > > > To
> > > > > > Andy_Holland at URSCorp.com
> > > > > > cc
> > > > > > Subject
> > > > > > Re: [mpich-discuss] Possible setup problem
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > I think I found the problem.  I should have checked this earlier.  It looks like your machines are set up to return 127.0.0.1 (the loopback address) when resolving their own hostname, rather than their actual IP address.
> > > > > > 
> > > > > > Try this on s051rhlapp01:
> > > > > >  hostname
> > > > > > It should return s051rhlapp01.  Then try:
> > > > > >  host s051rhlapp01
> > > > > > It should NOT return 127.0.0.1.  Then try the same thing on s051rhlapp01 (using it's own name).
> > > > > > 
> > > > > > If you don't get what you should, it indicates a problem with your network configuration.
> > > > > > 
> > > > > > -d
> > > > > > 
> > > > > > On Apr 26, 2011, at 5:04 PM, Andy_Holland at URSCorp.com wrote:
> > > > > > 
> > > > > > > 
> > > > > > > Here ya go. 
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > Andy Holland
> > > > > > > Air Quality Modeler
> > > > > > > URS Corporation
> > > > > > > 1600 Perimeter Park Drive
> > > > > > > Suite 400
> > > > > > > Morrisville, NC 27560
> > > > > > > Direct: (303) 796-4694
> > > > > > > Cell: (919) 619-4218
> > > > > > > Fax: (919) 461-1415
> > > > > > > andy_holland at urscorp.com 
> > > > > > > 
> > > > > > > This e-mail and any attachments contain URS Corporation confidential information that may be proprietary or privileged. If you receive this message in error or are not the intended recipient, you should not retain, distribute, disclose or use any of this information and you should destroy the e-mail and any attachments or copies.
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > Darius Buntinas <buntinas at mcs.anl.gov>
> > > > > > > 04/26/2011 05:56 PM 
> > > > > > > 
> > > > > > > To
> > > > > > > Andy_Holland at URSCorp.com
> > > > > > > cc
> > > > > > > Subject
> > > > > > > Re: [mpich-discuss] Possible setup problem
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > Oops I forgot to mention that you need to recompile the simple_test file:
> > > > > > > 
> > > > > > >  mpicc simple_test.c -o simple_test
> > > > > > > 
> > > > > > > Can you try it again?
> > > > > > > 
> > > > > > > Thanks,
> > > > > > > -d
> > > > > > > 
> > > > > > > On Apr 26, 2011, at 3:45 PM, Andy_Holland at URSCorp.com wrote:
> > > > > > > 
> > > > > > > > 
> > > > > > > > Ok, I applied the patch and rebuilt both installations and reran your test program.  Attached is the log file. 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > Thank you, 
> > > > > > > > 
> > > > > > > > Andy Holland
> > > > > > > > Air Quality Modeler
> > > > > > > > URS Corporation
> > > > > > > > 1600 Perimeter Park Drive
> > > > > > > > Suite 400
> > > > > > > > Morrisville, NC 27560
> > > > > > > > Direct: (303) 796-4694
> > > > > > > > Cell: (919) 619-4218
> > > > > > > > Fax: (919) 461-1415
> > > > > > > > andy_holland at urscorp.com 
> > > > > > > > 
> > > > > > > > This e-mail and any attachments contain URS Corporation confidential information that may be proprietary or privileged. If you receive this message in error or are not the intended recipient, you should not retain, distribute, disclose or use any of this information and you should destroy the e-mail and any attachments or copies.
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > Darius Buntinas <buntinas at mcs.anl.gov>
> > > > > > > > 04/26/2011 02:20 PM 
> > > > > > > > 
> > > > > > > > To
> > > > > > > > Andy_Holland at URSCorp.com
> > > > > > > > cc
> > > > > > > > Subject
> > > > > > > > Re: [mpich-discuss] Possible setup problem
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > Hmm.  I found a bug with error reporting.  While this won't directly fix your problem, it may help with identifying it.
> > > > > > > > 
> > > > > > > > Can you apply this patch, then rebuild and re-install mpich2 on both machines?
> > > > > > > > 
> > > > > > > >    (from the mpich2 source directory)
> > > > > > > >    patch -p0 < errno.patch
> > > > > > > >    make clean
> > > > > > > >    make
> > > > > > > >    make install
> > > > > > > > 
> > > > > > > > Then try the simple_test.c again and send us the log.
> > > > > > > > 
> > > > > > > > Thanks,
> > > > > > > > -d
> > > > > > > > 
> > > > > > > > [attachment "errno.patch" deleted by Andy Holland/Denver/URSCorp] 
> > > > > > > > 
> > > > > > > > On Apr 26, 2011, at 11:28 AM, Andy_Holland at URSCorp.com wrote:
> > > > > > > > 
> > > > > > > > > 
> > > > > > > > > Ok, I turned iptables off on both machines and reran it.  Attached is the log file. 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > Andy Holland
> > > > > > > > > Air Quality Modeler
> > > > > > > > > URS Corporation
> > > > > > > > > 1600 Perimeter Park Drive
> > > > > > > > > Suite 400
> > > > > > > > > Morrisville, NC 27560
> > > > > > > > > Direct: (303) 796-4694
> > > > > > > > > Cell: (919) 619-4218
> > > > > > > > > Fax: (919) 461-1415
> > > > > > > > > andy_holland at urscorp.com 
> > > > > > > > > 
> > > > > > > > > This e-mail and any attachments contain URS Corporation confidential information that may be proprietary or privileged. If you receive this message in error or are not the intended recipient, you should not retain, distribute, disclose or use any of this information and you should destroy the e-mail and any attachments or copies.
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > Darius Buntinas <buntinas at mcs.anl.gov> 
> > > > > > > > > Sent by: mpich-discuss-bounces at mcs.anl.gov
> > > > > > > > > 04/26/2011 11:13 AM
> > > > > > > > > Please respond to
> > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > To
> > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > cc
> > > > > > > > > Subject
> > > > > > > > > Re: [mpich-discuss] Possible setup problem
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > For some reason, it's not showing the specific socket error, but it's happening when a process on s051rhlapp02 tries to send a message to a process on s051rhlapp01.  Can you try disabling the firewalls on the machines and try it again?
> > > > > > > > > 
> > > > > > > > > Thanks,
> > > > > > > > > -d
> > > > > > > > > 
> > > > > > > > > On Apr 25, 2011, at 5:39 PM, Andy_Holland at URSCorp.com wrote:
> > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > Yeah, I put it in the wrong directory.  Ok, I reran in a shared area and I've attached the log file. 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > Thanks, 
> > > > > > > > > > 
> > > > > > > > > > Andy Holland
> > > > > > > > > > Air Quality Modeler
> > > > > > > > > > URS Corporation
> > > > > > > > > > 1600 Perimeter Park Drive
> > > > > > > > > > Suite 400
> > > > > > > > > > Morrisville, NC 27560
> > > > > > > > > > Direct: (303) 796-4694
> > > > > > > > > > Cell: (919) 619-4218
> > > > > > > > > > Fax: (919) 461-1415
> > > > > > > > > > andy_holland at urscorp.com 
> > > > > > > > > > 
> > > > > > > > > > This e-mail and any attachments contain URS Corporation confidential information that may be proprietary or privileged. If you receive this message in error or are not the intended recipient, you should not retain, distribute, disclose or use any of this information and you should destroy the e-mail and any attachments or copies.
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > Darius Buntinas <buntinas at mcs.anl.gov> 
> > > > > > > > > > Sent by: mpich-discuss-bounces at mcs.anl.gov
> > > > > > > > > > 04/25/2011 05:45 PM
> > > > > > > > > > Please respond to
> > > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > To
> > > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > > cc
> > > > > > > > > > Subject
> > > > > > > > > > Re: [mpich-discuss] Possible setup problem
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > Andy,
> > > > > > > > > > 
> > > > > > > > > > Looking through the log file, I see a line that says:
> > > > > > > > > > 
> > > > > > > > > > [proxy:0:1 at s051rhlapp02] launch_procs (/usr/local/mpich2-1.3.2p1/src/pm/hydra/pm/pmiserv/pmip_cb.c:639): unable to change wdir to /home/andy_holland/mpich_test (No such file or directory)
> > > > > > > > > > 
> > > > > > > > > > Can you check that you can access /home/andy_holland/mpich_test from s051rhlapp02 ?
> > > > > > > > > > 
> > > > > > > > > > If not, put simple_test into a directory that's accessible from both machines, and try it again.
> > > > > > > > > > 
> > > > > > > > > > Thanks,
> > > > > > > > > > -d
> > > > > > > > > > 
> > > > > > > > > > On Apr 25, 2011, at 3:55 PM, Andy_Holland at URSCorp.com wrote:
> > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > Daruis, 
> > > > > > > > > > >         Thanks.  If I had just thought for a second longer I would have had it.  Attached is the log file for your test program. 
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > Andy Holland
> > > > > > > > > > > Air Quality Modeler
> > > > > > > > > > > URS Corporation
> > > > > > > > > > > 1600 Perimeter Park Drive
> > > > > > > > > > > Suite 400
> > > > > > > > > > > Morrisville, NC 27560
> > > > > > > > > > > Direct: (303) 796-4694
> > > > > > > > > > > Cell: (919) 619-4218
> > > > > > > > > > > Fax: (919) 461-1415
> > > > > > > > > > > andy_holland at urscorp.com 
> > > > > > > > > > > 
> > > > > > > > > > > This e-mail and any attachments contain URS Corporation confidential information that may be proprietary or privileged. If you receive this message in error or are not the intended recipient, you should not retain, distribute, disclose or use any of this information and you should destroy the e-mail and any attachments or copies.
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > Darius Buntinas <buntinas at mcs.anl.gov> 
> > > > > > > > > > > Sent by: mpich-discuss-bounces at mcs.anl.gov
> > > > > > > > > > > 04/25/2011 04:32 PM
> > > > > > > > > > > Please respond to
> > > > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > To
> > > > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > > > cc
> > > > > > > > > > > Subject
> > > > > > > > > > > Re: [mpich-discuss] Possible setup problem
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > Sorry.  Just run:
> > > > > > > > > > >    mpicc simple_test.c -o simple_test
> > > > > > > > > > > 
> > > > > > > > > > > If you needed to specify the full path for mpiexec, use the same path for mpicc.  This will generate the executable called simple_test.
> > > > > > > > > > > 
> > > > > > > > > > > -d
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > On Apr 25, 2011, at 3:26 PM, Andy_Holland at URSCorp.com wrote:
> > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > Darius, 
> > > > > > > > > > > >         Thanks for your help with this.  You'll have to forgive me though, I'm a Fortran programmer and I'm not exactly sure how to compile the program you sent me.  I have gcc by the way. 
> > > > > > > > > > > > 
> > > > > > > > > > > > Thanks, 
> > > > > > > > > > > > 
> > > > > > > > > > > > Andy Holland
> > > > > > > > > > > > Air Quality Modeler
> > > > > > > > > > > > URS Corporation
> > > > > > > > > > > > 1600 Perimeter Park Drive
> > > > > > > > > > > > Suite 400
> > > > > > > > > > > > Morrisville, NC 27560
> > > > > > > > > > > > Direct: (303) 796-4694
> > > > > > > > > > > > Cell: (919) 619-4218
> > > > > > > > > > > > Fax: (919) 461-1415
> > > > > > > > > > > > andy_holland at urscorp.com 
> > > > > > > > > > > > 
> > > > > > > > > > > > This e-mail and any attachments contain URS Corporation confidential information that may be proprietary or privileged. If you receive this message in error or are not the intended recipient, you should not retain, distribute, disclose or use any of this information and you should destroy the e-mail and any attachments or copies.
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > Darius Buntinas <buntinas at mcs.anl.gov> 
> > > > > > > > > > > > Sent by: mpich-discuss-bounces at mcs.anl.gov
> > > > > > > > > > > > 04/25/2011 03:19 PM
> > > > > > > > > > > > Please respond to
> > > > > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > To
> > > > > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > > > > cc
> > > > > > > > > > > > Subject
> > > > > > > > > > > > Re: [mpich-discuss] Possible setup problem
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > OK, can you try the attached test program with the same number of processes and machine file, but also add the -l option to mpiexec (to label the lines of output with the rank).
> > > > > > > > > > > > 
> > > > > > > > > > > > Thanks,
> > > > > > > > > > > > -d
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > [attachment "simple_test.c" deleted by Andy Holland/Denver/URSCorp] 
> > > > > > > > > > > > On Apr 25, 2011, at 2:00 PM, Andy_Holland at URSCorp.com wrote:
> > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > I've attached the log for running cpi using the same machinefile. 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Thank you, 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Andy Holland
> > > > > > > > > > > > > Air Quality Modeler
> > > > > > > > > > > > > URS Corporation
> > > > > > > > > > > > > 1600 Perimeter Park Drive
> > > > > > > > > > > > > Suite 400
> > > > > > > > > > > > > Morrisville, NC 27560
> > > > > > > > > > > > > Direct: (303) 796-4694
> > > > > > > > > > > > > Cell: (919) 619-4218
> > > > > > > > > > > > > Fax: (919) 461-1415
> > > > > > > > > > > > > andy_holland at urscorp.com 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > This e-mail and any attachments contain URS Corporation confidential information that may be proprietary or privileged. If you receive this message in error or are not the intended recipient, you should not retain, distribute, disclose or use any of this information and you should destroy the e-mail and any attachments or copies.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Darius Buntinas <buntinas at mcs.anl.gov> 
> > > > > > > > > > > > > Sent by: mpich-discuss-bounces at mcs.anl.gov
> > > > > > > > > > > > > 04/25/2011 02:51 PM
> > > > > > > > > > > > > Please respond to
> > > > > > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > To
> > > > > > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > > > > > cc
> > > > > > > > > > > > > Subject
> > > > > > > > > > > > > Re: [mpich-discuss] Possible setup problem
> > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Hi Andy,
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Can you try running cpi from the examples directory of the MPICH2 source tree with the same number of processes and the same machine file?  Let us know if that works, and, if not, send us the output, please.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > -d
> > > > > > > > > > > > > 
> > > > > > > > > > > > > On Apr 25, 2011, at 1:30 PM, Andy_Holland at URSCorp.com wrote:
> > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > It was suggested that I send out all the error messages.  I've attached a log file from the model. 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > Thank you, 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > Andy Holland
> > > > > > > > > > > > > > Air Quality Modeler
> > > > > > > > > > > > > > URS Corporation
> > > > > > > > > > > > > > 1600 Perimeter Park Drive
> > > > > > > > > > > > > > Suite 400
> > > > > > > > > > > > > > Morrisville, NC 27560
> > > > > > > > > > > > > > Direct: (303) 796-4694
> > > > > > > > > > > > > > Cell: (919) 619-4218
> > > > > > > > > > > > > > Fax: (919) 461-1415
> > > > > > > > > > > > > > andy_holland at urscorp.com 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > This e-mail and any attachments contain URS Corporation confidential information that may be proprietary or privileged. If you receive this message in error or are not the intended recipient, you should not retain, distribute, disclose or use any of this information and you should destroy the e-mail and any attachments or copies.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > Dave Goodell <goodell at mcs.anl.gov> 
> > > > > > > > > > > > > > Sent by: mpich-discuss-bounces at mcs.anl.gov
> > > > > > > > > > > > > > 04/25/2011 02:22 PM
> > > > > > > > > > > > > > Please respond to
> > > > > > > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > To
> > > > > > > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > > > > > > cc
> > > > > > > > > > > > > > Subject
> > > > > > > > > > > > > > Re: [mpich-discuss] Possible setup problem
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > On Apr 25, 2011, at 12:59 PM CDT, Andy_Holland at URSCorp.com wrote:
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > When I run from either machine using CPUs from both machines the run stops with many mpi messages.  Below is the last message in the list: 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > main (/usr/local/mpich2-1.3.2p1/src/pm/hydra/ui/mpich/mpiexec.c:404): process manager error waiting for completion 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > Can you send us all of the error messages?  Typically the first error messages are the most useful/relevant; the last ones often are just messages announcing some sort of cleanup or secondary error caused by the original error.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > -Dave
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > _______________________________________________
> > > > > > > > > > > > > > mpich-discuss mailing list
> > > > > > > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > > > > > > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > <run.cctm.parallel.txt>_______________________________________________
> > > > > > > > > > > > > > mpich-discuss mailing list
> > > > > > > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > > > > > > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> > > > > > > > > > > > > 
> > > > > > > > > > > > > _______________________________________________
> > > > > > > > > > > > > mpich-discuss mailing list
> > > > > > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > > > > > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> > > > > > > > > > > > > 
> > > > > > > > > > > > > <cpi_log.txt>_______________________________________________
> > > > > > > > > > > > > mpich-discuss mailing list
> > > > > > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > > > > > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> > > > > > > > > > > > 
> > > > > > > > > > > > _______________________________________________
> > > > > > > > > > > > mpich-discuss mailing list
> > > > > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > > > > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> > > > > > > > > > > > 
> > > > > > > > > > > > _______________________________________________
> > > > > > > > > > > > mpich-discuss mailing list
> > > > > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > > > > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> > > > > > > > > > > 
> > > > > > > > > > > _______________________________________________
> > > > > > > > > > > mpich-discuss mailing list
> > > > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > > > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> > > > > > > > > > > 
> > > > > > > > > > > <simple_test_log.txt>_______________________________________________
> > > > > > > > > > > mpich-discuss mailing list
> > > > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > > > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> > > > > > > > > > 
> > > > > > > > > > _______________________________________________
> > > > > > > > > > mpich-discuss mailing list
> > > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> > > > > > > > > > 
> > > > > > > > > > <simple_test_log.txt>_______________________________________________
> > > > > > > > > > mpich-discuss mailing list
> > > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> > > > > > > > > 
> > > > > > > > > _______________________________________________
> > > > > > > > > mpich-discuss mailing list
> > > > > > > > > mpich-discuss at mcs.anl.gov
> > > > > > > > > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> > > > > > > > > 
> > > > > > > > > <simple_test_log.txt>
> > > > > > > > 
> > > > > > > > 
> > > > > > > > <simple_test_log.txt>
> > > > > > > 
> > > > > > > 
> > > > > > > <simple_test_log.txt>
> > > > > > 
> > > > > > 
> > > > > 
> > > > > 
> > > > 
> > > > 
> > > > <simple_test_log.txt>
> > > 
> > > 
> > 
> > _______________________________________________
> > mpich-discuss mailing list
> > mpich-discuss at mcs.anl.gov
> > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> > 
> > _______________________________________________
> > mpich-discuss mailing list
> > mpich-discuss at mcs.anl.gov
> > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> 
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> 
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss



More information about the mpich-discuss mailing list