[MPICH] MPICH between Playstation3 and Intel PC?

William Gropp gropp at mcs.anl.gov
Mon Mar 12 09:22:17 CDT 2007


This most likely means that the p4 code got the byte order wrong for  
this platform.  The p4 part of the MPICH1 code predates tools like  
configure and portable operating systems, and uses a simple  
identification with the OS name to determine the architectural  
parameters.  You might need to add a new machine type for the  
Playstation so that MPICH1 gets the right byte order.  Let us know if  
you need help finding that code (its in the ch_p4/p4/lib directory).

Bill

On Mar 8, 2007, at 6:56 AM, Jan Wagner wrote:

>
> On Thu, 8 Mar 2007, Jan Wagner wrote:
>> I am trying to get an MPICH 1.2.7 program to work on Playstation,  
>> and execute an MPICH 1.2.5 program on an Intel PC (named "warp"  
>> below). However, after a short while the Init gives me a  
>> net_conn_to_listener error.
>>
>> It's the same when executing the remote exec command on the  
>> command line:
>>
>> [jwagner at ps3-001 ~]$  ssh warp -l jwagner -n /usr/bin/mpifxcorr  
>> ps3-001.kurp.hut.fi 56995 \-p4amslave \-p4yourname warp
>> DiFX Intel IPP Version
>> About to run MPIInit
>> rm_29828:  p4_error: rm_start: net_conn_to_listener failed: 56995
>
> Well almost there now. I had to shut down iptables on the  
> playstation, the random port forwarding that MPI wants to set up  
> did not really work for some reason. Without firewall it works.
>
> But now it crashes with:
>
> DiFX Generic CPU Version
> About to run MPIInit
> p2_859:  p4_error: Could not allocate memory for commandline args:  
> 889192448
> rm_l_2_871: (1.829863) net_send: could not write to fd=5, errno = 32
> 0.16user 0.39system 0:03.69elapsed 15%CPU (0ap1_4309:  p4_error:  
> net_recv read:  probable EOF on socket: 1
> p3_875:  p4_error: Could not allocate memory for commandline args:  
> 889192448
> rm_l_3_886: (1.346863) net_send: could not write to fd=5, errno = 32
>
> Any ideas?
>
> I'm using p4, by the way. And starting the process with
> $ declare -x P4_RSHCOMMAND="ssh"
> $ declare -x RSHCOMMAND="ssh"
> $ mpirun -v -np 6 -machinefile /nfs/mpitest/machines.CLUSTER /usr/ 
> bin/mpitest /nfs/mpitest/data.in.cfg
>
> Wasn't MPICH-1 supposed to be for heterogenous clusters too?
>
>  - Jan
>




More information about the mpich-discuss mailing list