[mpich-discuss] Multithread server using MPICH-2

Darius Buntinas buntinas at mcs.anl.gov
Fri Sep 19 10:55:38 CDT 2008


We have seen performance problems with multithreaded processes when 
oversubscribing the processor cores (i.e., more threads or processes 
than cores).  In 1.1, the default channel is nemesis which will use 
shared memory for communicating with processes on the same node. 
Nemesis uses busy polling, so even if a thread is waiting in an MPI 
function, it is still using processor time and preventing other threads 
from using the processor.

We are working on fixing this by yielding the processor at the right 
time when polling.  The fact that calling sleep makes it work indicates 
that the problem you are seeing is probably due to the busy waiting and 
oversubscription problem.

In the mean time, you could try avoiding oversubscribing.  Or you can 
use the sock channel which doesn't use shared memory, but it does block 
(relinquishing the processor) in blocking MPI functions.  To use the 
sock channel, configure with "--with-device=ch3:sock" .

Another thing that may be making the problem worse, is that the 
sched_yield() function is broken on new kernels.  You can fix this if 
you have root access with:
   echo 1 > /proc/sys/kernel/sched_compat_yield
See if this helps.

As I mentioned we are working on this issue and are expecting to have it 
working in the final 1.1 release.

-d

On 09/18/2008 11:03 PM, Gisele Machado de Souza wrote:
> Hello,
> 
> I'm implementing a server in MPI that accepts more than one connection 
> from clients at the same time.
> For do that I used MPI(mpich2-1.1.0a1.tar.gz) and pthreads.
> 
> What I want to do is a server that stays in a infinity loop waiting for 
> connections (MPI_Comm_accept(portMD, MPI_INFO_NULL, 0, MPI_COMM_WORLD, 
> newCommClient);). When a connection is established, he creates a new 
> thread and return to wait more connections.  That means, the server and 
> the thread will work in parallel.
> 
> The function that the thread will execute calls mpi functions, like 
> MPI_probe, MPI_Get_count, MPI_Recv, MPI_Send,  MPI_Pack and  MPI_Unpack.
> 
> The problem I'm having is that the server and the thread are not working 
> in parallel successfully. Sometimes, the program hangs, do nothing, and 
> in another times a fatal error appears (Assertion failed in file 
> sock_wait.i at line 236: (pollfd->events & (0x001 | 0x004)) || 
> pollfd->fd == -1).
> 
> When I put the server to sleep for a moment, before he will wait another 
> connection, during the time he was sleeping the created thread works 
> fine. Once the server wakes up and starts to wait for a connection, 
> things stop working.
> 
> A peace of my code (Server):
> -----------------------------------------------------------------------------------------------------------------------------
> // arguments passed to a thread
> typedef struct
> { MPI_Comm communicator;
>   char * path1;
>   char * path2;
>   char * port;  
> } ThreadParam;
> 
>  pthread_t threads[10];
>  int t =0;
>  int rc;
> 
> 
>  ThreadParam * listParam = NULL;
>  MPI_Comm * newCommClient;
>    
> 
>   /* server  infinity loop */
>  while (time_out > 10)
>  {
>   
>      newCommClient = malloc(sizeof(MPI_Comm));
>    
>    
>      /* waiting for a connection */
>      MPI_Comm_accept(portMD, MPI_INFO_NULL, 0, MPI_COMM_WORLD, 
> newCommClient);
>           
>      listParam = malloc(sizeof(ThreadParam));     
>      listParam->communicator = *newCommClient; //with this communicator 
> the thread will talk with 
> the                                                                                     
> /                                                                               
> //client
>      listParam->path1 = argv[2];
>      listParam->path2 = argv[4];
>      listParam->port = portMD;
>    
>      rc = pthread_create(&threads[t], NULL, threadfunc, (void *) listParam);
>    
>      if (rc){
>          printf("ERROR; return code from pthread_create() is %d\n", rc);
>          exit(-1);
>      }
>      //sleep(1);
>      t++;
>   }
> 
> -----------------------------------------------------------------------------------------------------------------------------
> Please, I need help to solve this problem!!
> 
> Thanks very mych!
> 
> Gisele




More information about the mpich-discuss mailing list