[mpich-discuss] How is MPI Initialization done ?

Nicholas Karonis karonis at niu.edu
Wed Mar 12 07:29:49 CDT 2008


Dear Krishna,

That's a question for the Globus folks (discuss at globus.org, I think)
but I will try to answer it myself here.

You have correctly identified the Globus lib routines that MPICH-G2
uses to do host+port exchange during bootstrapping.  All of that
inter-machine messaging is passed over ports that Globus (not
MPICH-G2) establishes.   From memory (it's been many years back)
there's a Globus gatekeeper which is a server process running
on the login node of each machine you want your application
to run on.  The gatekeeper is listening on a port.  The globusrun
command connects to that port and a new socket connection is
established between the gatekeeper and the globusrun process.
That new socket connection is handed off to the a new process
that the gatekeeper spawns which is the job manager (JM).  It
is the JM that runs the whole job (the gatekeeper just handles
security and gets things started).  The JM starts the job and
tells the running app a host+port that it can connect back to.
All bootstrapping comms (e.g.,  globus_duroc_runtime_inter_subjob_send)
is facilitated from the running app through the JM.

Like I said, I did all that from memory going back many years
and so I may have some of the details wrong.  We (MPICH-G2 developers)
didn't write that Globus code ... we simply called into the Globus
lib.  The Globus folks are the ones to ask to get a detailed
(and correct) answer.

Nick

On Mar 11, 2008, at 9:33 AM, Rajeev Thakur wrote:

>
>
> From: owner-mpich-discuss at mcs.anl.gov [mailto:owner-mpich-discuss at mcs.anl.gov 
> ] On Behalf Of Ravi Thati
> Sent: Tuesday, March 11, 2008 8:35 AM
> To: mpich-discuss at mcs.anl.gov
> Subject: Re: [mpich-discuss] How is MPI Initialization done ?
>
> Thanks Krishna for ur efforts.
> But,
> I am expressing this doubt related to MPICH-G2 means mpich1 device  
> globus2.
> How  is all-to-all (hostname+port no) exchange is done) ?
>
> As far as now I traced till the calls to  
> globus_duroc_runtime_inter_subjob_send from the  intra_subjob_send  
> call which is again called from intra_subjob_bcast call.
>
> Is this  
> globus_duroc_runtime_inter_subjob_send 
>  ,globus_duroc_runtime_inter_subjob_receive exchage through the  
> hostname+port through gatekeeper or any other ports ??
>
> Any help will be great indeed.
>
> On Tue, Mar 11, 2008 at 5:54 PM, Krishna Chaitanya <kris.c1986 at gmail.com 
> > wrote:
> Hi,
>         I have traced through the point to point module a few times.  
> So, may be I can share the flow of events :
> MPI_Init does some basic error checking first and calls  
> MPIR_Init_thread() which primarily populates the MPIR_Process data  
> structure, which also includes the communicator structure. Apart  
> from that, it also initialises some other global variables and the  
> channel interface.
>
> > How the initial communication is established in order to exchange  
> the hostname+ port number details ?
>          Suppose, you are using the standard blocking send mode, the  
> bulk of this work is done in the MPID_Send() function. Here, the  
> library either selects the eager mode or the rendezvous  
> mode ,depending on the size of the message that you intend to  
> transfer. There is a cut-off size (  vc->eager_max_msg_sz )  upto  
> which the library chooses the eager mode,   for data sizes beyond  
> this, the rendezvous mode is selected.
>          Suppose, the message is small and the data is  
> contiguous,either the MPIDI_CH3_EagerContigShortSend() function is  
> invoked. Over here, the eager packet is initialized and the data is  
> copied from the buffer to the eager packet and invokes the  
> MPIDI_CH3_iStartMsg(). This function examines the current connection  
> state, since it is the first call, it is still un-connected. It  
> creates the request data structure and enqueues it. The progress  
> engine takes care of the request from here. The library now tries to  
> form a new connection to the node whose rank has been specified,by  
> invoking the MPIDI_CH3I_VC_post_connect() function. The destination  
> is already known to the library through the rank and the  
> hostname( mpd.hosts). This function gets the info related to the  
> destination IP and the port address.
>          The library now invokes the progress engine, through the  
> call to   MPIDI_CH3i_Progress_wait(). The actual connection  
> establishment and the data transfer takes place in this function  
> through the calls to MPIDI_CH3I_Progress_handle_sock_event() and  
> MPIDU_Sock_wait(). It loops around till the completion counter is  
> set to 1.
>
> >How is the hostname+port number details sent and received
>          I remember Dave had once helped me out with this : http://wiki.mcs.anl.gov/mpich2/index.php/Sock_conn_protocol
>
>          Its quite copious in content. The best way to understand  
> what is happening is to actually trace through the entire progress  
> engine.
>
> > How is all-to-all communication is done? On which ports this will  
> be done ?
>          I am sure the other experienced members would help you on  
> this. I havent done much in this area.
>
>          I just have a few months exposure to MPICH. If I have gone  
> wrong somewhere, please do correct me.
>
> Best,
> Krishna Chaitanya K,
> Final Year B-Tech,
> Dept. of Information Technology,
> National Institute of Technology,Karnataka (NITK)
>
>
>
>
>
>
> On Tue, Mar 11, 2008 at 3:15 AM, Ravi Thati <gotothati at gmail.com>  
> wrote:
> Hello All,
>
>       I am working with MPICH-G2.
>        I came to know  (from mailing search) that  while MPI_Init is  
> called , the initialization is being done.
>       All-to-all exchange of host+port details is done.
>
> My doubt is :
>
>  How the initial communication is established in order to exchange  
> the hostname+ port number details ?
>  How is the hostname+port number details sent and received ? what  
> are the destination IPs these packets?
>
>  How is all-to-all communication is done? On which ports this will  
> be done ?
>  Only after the exchange , each process can communicate with other.
>
>    Thanks for any clarifications.
> -- 
> Regards,
> Ravi.Thati
>
>
>
> -- 
> In the middle of difficulty, lies opportunity
>
>
>
> -- 
> Thanks & Regards,
> Ravi.Thati




More information about the mpich-discuss mailing list