[mpich-discuss] Need some help getting mpich to work

Dave Goodell goodell at mcs.anl.gov
Tue Nov 24 13:25:21 CST 2009


Your new "make testing" run is having problems with the MPI-I/O  
tests.  What sort of file system are you running this on?

As for the mpdboot problems, I'm not sure what is happening.  I would  
attempt to use the mpdcheck tool described in Appendix A of the  
Installer's Guide [1] to diagnose the problem.

You might also be having trouble because you are running part of your  
jobs on the cluster head node.  MPD's mpiexec will attempt to run 1  
process locally first unless passed the "-1" option.  Unfortunately  
there is no easy way to pass that option to the "make testing" process.

You can also try using hydra instead of MPD.  You should just be able  
to run "mpiexec.hydra -f /home/su/mpd.hosts -n 124 hostname | sort |  
uniq -c | sort -n" to sanity check that it works (you should get 31  
lines, each starting with 4).  If it does work for you, you can  
rebuild MPICH2 with "--enable-pm=hydra,mpd" to make hydra the default  
mpiexec.  Hydra will use the hostfile specified on the command line  
but it will also look at the file specified by the $HYDRA_HOST_FILE  
environment variable.  See [2] for more information on using hydra.

-Dave

[1] http://www.mcs.anl.gov/research/projects/mpich2/documentation/files/mpich2-1.2-installguide.pdf
[2] http://wiki.mcs.anl.gov/mpich2/index.php/Using_the_Hydra_Process_Manager

On Nov 24, 2009, at 12:50 PM, Hung-Hsun Su wrote:

> Unfortunately, the latest release did not solve the issue. It  
> actually introduces new bug. After installation, mpdboot cannot  
> setup the environment correctly. It freezes when I make the  
> following call.
>
> [su at alpha ~]$ which mpdboot
> /home/su/software/mpich2-1.2.1/bin/mpdboot
> [su at alpha ~]$ mpdboot -n 32 --ncpus=1 -f /home/su/mpd.hosts
>
> After I killed the process and found out that only 4/31 of the  
> compute nodes were setup
>
> [su at alpha ~]$ mpdtrace
> alpha
> compute-0-3
> compute-0-1
> compute-0-0
> compute-0-2
>
> I then tried setting up using the 1.2 version and it works fine.
>
> [su at alpha ~]$ /home/su/software/mpich2-1.2/bin/mpdboot -n 32 -- 
> ncpus=1 -f /home/su/mpd.hosts
> [su at alpha ~]$ mpdtrace
> alpha
> compute-0-3
> compute-0-11
> compute-0-10
> compute-0-9
> compute-0-8
> compute-0-1
> compute-0-15
> compute-0-14
> compute-0-13
> compute-0-12
> compute-0-0
> compute-0-19
> compute-0-27
> compute-0-26
> compute-0-25
> compute-0-24
> compute-0-18
> compute-0-30
> compute-0-29
> compute-0-28
> compute-0-17
> compute-0-16
> compute-0-2
> compute-0-7
> compute-0-6
> compute-0-5
> compute-0-4
> compute-0-23
> compute-0-22
> compute-0-21
> compute-0-20
>
> I then tried ran make testing and got even more error. Anyone has an  
> idea of what is going on?
>
> Hung-Hsun
>
> PS. I've attached the output files from various steps
> c.txt - configuration
> m.txt - make
> mi.txt - make install
> mpd.hosts - my machine file
> mtest.txt - make testing
> summary.xml - output in test/mpi directory from make testing
>
>> Can you try the latest release 1.2.1?
>>
>> Rajeev
>>
>>> -----Original Message-----
>>> From: mpich-discuss-bounces at mcs.anl.gov [mailto:mpich-discuss-bounces at mcs.anl.gov 
>>> ] On Behalf Of Hung-Hsun Su
>>> Sent: Monday, November 23, 2009 11:16 AM
>>> To: mpich-discuss at mcs.anl.gov
>>> Subject: [mpich-discuss] Need some help getting mpich to work
>>>
>>> Hi,
>>>
>>> I was wondering if anyone can help me figure out why my MPICH2  
>>> installation isn't working correctly. I've downloaded the v1.2  
>>> version, configured using "configure --prefix=/home/su/software/ 
>>> mpich2-1.2", make and make install and everything seemed fine  
>>> (I've attached the 3 txt output from configure, make and make  
>>> install which shows no error).  I then tried to see if my  
>>> installation is working correctly by running the mpich-test suite  
>>> (result given in summary.xml) and some of the tests failed  
>>> (collective).  Does anyone know what might be the cause of my  
>>> problem? Thanks.
>>>
>>> System spec:
>>> 32 nodes Quad-core Xeon cluster
>>> Linux version 2.6.9-55.0.2.ELsmp (mockbuild at builder6.centos.org)  
>>> (gcc version 3.4.6 20060404 (Red Hat 3.4.6-8)) #1 SMP Tue Jun 26  
>>> 14:14:47 EDT 2007
>>>
>>> Hung-Hsun
>>>
>>> -- 
>>>
>>> --------------------------------------------------------------
>>> ---------------------------------------------
>>> Sincerely,
>>> Hung-Hsun Su
>>> Ph.D. Student, UPC Group Leader, Research Assistant, Teaching  
>>> Assistant
>>> High-performance Computing and Simulation (HCS) Research Laboratory
>>> Dept. of Electrical and Computer Engineering , University of  
>>> Florida,
>>> Gainesville, FL 32611-6200
>>> Email: su at hcs.ufl.edu, hunghsun at ufl.edu
>>> --------------------------------------------------------------
>>> ----------------------------------------------
>>>
>>>
>>>
>>
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>
>
>
> -- 
>
> -----------------------------------------------------------------------------------------------------------
> Sincerely,
> Hung-Hsun Su
> Ph.D. Student, UPC Group Leader, Research Assistant, Teaching  
> Assistant
> High-performance Computing and Simulation (HCS) Research Laboratory
> Dept. of Electrical and Computer Engineering , University of Florida,
> Gainesville, FL 32611-6200
> Email: su at hcs.ufl.edu, hunghsun at ufl.edu
> ------------------------------------------------------------------------------------------------------------
>
> < 
> mi 
> .txt 
> > 
> < 
> c 
> .txt 
> > 
> < 
> m 
> .txt 
> > 
> < 
> mpd 
> .hosts 
> > 
> < 
> summary.xml><mtest.txt>_______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss



More information about the mpich-discuss mailing list