[mpich-discuss] thread MPI calls

Thu Jul 30 01:57:39 CDT 2009

If a process is idle for a long time, the Wait call with keep calling 
poll() to check if anything has arrived. For example, try this experiment:

Process 0:
	MPI_Irecv();
	MPI_Wait();

Process 1:
	sleep(10);
	MPI_Send();

Process 0 will see a lot of "system activity" as it's busily waiting for 
data to arrive.

  -- Pavan

On 07/29/2009 03:19 PM, chong tan wrote:
> I like to provide futher info on what we have experiment, hopefully this
> can of some use, even with my future competitor (they suscribe this 
> email too).
>  
> 1.  We replace Wait_all for Wait_any, and there is no different in 
> performance
> 2.  We experimented with affining the recv thread to the same physcial 
> CPU-same core
>      pair, different core pair, and same core as main thread (core pair 
> done on AMD boxes).  . 
>     This experiment is done on :
>           - Irecv versus Recv
>           - wait_all and wait_any
>           - early versus late wait_all, and wait_any, wait
>           - wait in recv thread versus wait in main thread
>     Amazigly, running the recv thread on the same core as main thread is 
> the fastest one, it almost
>     has the same performance as non-threaded implementation with some 
> code combo.
>  
>     Early wait, both all and any, is the worst performer.
>  
>    Given that we know one partitcular test between more than 5% of time 
> is spent in master
>    proc (id ==0) completing the already sent data via MPI_Recv (from MPE 
> and our monitor, and
>    the fact that master proc got the biggest chunk of work in the test), 
> we expect to see some positive
>    sign of live using threaded MPI. or at least not the negative gain as 
> we have experience.
>  
>  
> one issue we observed what that Wait*,  is causing significant sys 
> activities in other processes.
> Maybe this is the problem, maybe not.
>  
> tan
>  
> ------------------------------------------------------------------------
> *From:* Pavan Balaji <balaji at mcs.anl.gov>
> *To:* mpich-discuss at mcs.anl.gov
> *Sent:* Tuesday, July 28, 2009 7:24:49 PM
> *Subject:* Re: [mpich-discuss] thread MPI calls
> 
> 
>  > we just completed 1 partiticular tests using SERIALIZED, and that 
> make no difference (compared to
>  > MULTIPLE).
> 
> In that case, the problem is likely with the algorithm itself. In 
> SERIALIZED mode, MPI does not add any locks and will not add any 
> additional overhead. But it looks like your algorithm is blocking 
> waiting for data from all slave processes before proceeding to the next 
> iteration -- this will cause a lot of idle time. Is there someway you 
> can optimize your algorithm?
> 
> -- Pavan
> 
> -- Pavan Balaji
> http://www.mcs.anl.gov/~balaji
> 

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji