[mpich-discuss] Cluster problem running MPI programs

Pavan Balaji balaji at mcs.anl.gov
Tue Apr 10 20:01:59 CDT 2012


[please keep mpich-discuss cc'ed]

Sorry, we can't help you with mpd.

I'm not sure which tutorial you are following, but if you look at the 
README, it will give you step-by-step instructions for building and 
installing mpich2.

  -- Pavan

On 04/10/2012 07:58 PM, Brice Chaffin wrote:
> I realize mpd is deprecated, but this being my first time, I was
> following tutorials that relied on mpd, so I made the decision to use it
> on my first run. I did very little configuration of mpich2, beyond
> disabling Fortran options(since I have no Fortran compilers installed)
> and setting the install directory. I am not familiar with all the config
> options yet, so I may have left something important out. The problem
> with the tutorials is that they assume everything works the first time.
> I had to do some fine tuning afterwards, and may need to do some more to
> correct this. I'm just not quite sure yet where the trouble is.
>
> On Tue, 2012-04-10 at 19:41 -0500, Pavan Balaji wrote:
>> Hello,
>>
>> How did you configure mpich2?  Please note that mpd is now deprecated
>> and is not supported.  In 1.4.1p1, mpd should not be built at all by
>> default.
>>
>>    -- Pavan
>>
>> On 04/10/2012 07:27 PM, Brice Chaffin wrote:
>>> Hi all,
>>>
>>> I have built a small cluster, but seem to be having a problem.
>>>
>>> I am using Ubuntu Linux 11.04 server edition on two nodes, with an NFS
>>> share for a common directory when running as a cluster.
>>>
>>> According to mpdtrace the ring is fully functional. Both machines are
>>> recognized and communicating.
>>>
>>> I can run regular c programs compiled with gcc using mpiexec or mpirun,
>>> and results are returned from both nodes. When running actual MPI
>>> programs, such as the examples included with MPICH2, or ones I compile
>>> myself with mpicc, I get this:
>>>
>>> rank 1 in job 8  node1_33851   caused collective abort of all ranks
>>>     exit status of rank 1: killed by signal 4
>>>
>>> I am including mpich2version output so you can see exactly how I built
>>> it.
>>>
>>> MPICH2 Version:    	1.4.1p1
>>> MPICH2 Release date:	Thu Sep  1 13:53:02 CDT 2011
>>> MPICH2 Device:    	ch3:nemesis
>>> MPICH2 configure: 	--disable-f77 --disable-fc --with-pm=mpd
>>> --prefix=/home/bchaffin/mpich2
>>> MPICH2 CC: 	gcc    -O2
>>> MPICH2 CXX: 	c++   -O2
>>> MPICH2 F77: 	
>>> MPICH2 FC:
>>>
>>> This is my first time working with a cluster, so any advice or
>>> suggestions are more than welcome.
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
>>> To manage subscription options or unsubscribe:
>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>
>
>
>

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji


More information about the mpich-discuss mailing list