[mpich-discuss] Multithread server using MPICH-2
Darius Buntinas
buntinas at mcs.anl.gov
Fri Sep 19 10:55:38 CDT 2008
We have seen performance problems with multithreaded processes when
oversubscribing the processor cores (i.e., more threads or processes
than cores). In 1.1, the default channel is nemesis which will use
shared memory for communicating with processes on the same node.
Nemesis uses busy polling, so even if a thread is waiting in an MPI
function, it is still using processor time and preventing other threads
from using the processor.
We are working on fixing this by yielding the processor at the right
time when polling. The fact that calling sleep makes it work indicates
that the problem you are seeing is probably due to the busy waiting and
oversubscription problem.
In the mean time, you could try avoiding oversubscribing. Or you can
use the sock channel which doesn't use shared memory, but it does block
(relinquishing the processor) in blocking MPI functions. To use the
sock channel, configure with "--with-device=ch3:sock" .
Another thing that may be making the problem worse, is that the
sched_yield() function is broken on new kernels. You can fix this if
you have root access with:
echo 1 > /proc/sys/kernel/sched_compat_yield
See if this helps.
As I mentioned we are working on this issue and are expecting to have it
working in the final 1.1 release.
-d
On 09/18/2008 11:03 PM, Gisele Machado de Souza wrote:
> Hello,
>
> I'm implementing a server in MPI that accepts more than one connection
> from clients at the same time.
> For do that I used MPI(mpich2-1.1.0a1.tar.gz) and pthreads.
>
> What I want to do is a server that stays in a infinity loop waiting for
> connections (MPI_Comm_accept(portMD, MPI_INFO_NULL, 0, MPI_COMM_WORLD,
> newCommClient);). When a connection is established, he creates a new
> thread and return to wait more connections. That means, the server and
> the thread will work in parallel.
>
> The function that the thread will execute calls mpi functions, like
> MPI_probe, MPI_Get_count, MPI_Recv, MPI_Send, MPI_Pack and MPI_Unpack.
>
> The problem I'm having is that the server and the thread are not working
> in parallel successfully. Sometimes, the program hangs, do nothing, and
> in another times a fatal error appears (Assertion failed in file
> sock_wait.i at line 236: (pollfd->events & (0x001 | 0x004)) ||
> pollfd->fd == -1).
>
> When I put the server to sleep for a moment, before he will wait another
> connection, during the time he was sleeping the created thread works
> fine. Once the server wakes up and starts to wait for a connection,
> things stop working.
>
> A peace of my code (Server):
> -----------------------------------------------------------------------------------------------------------------------------
> // arguments passed to a thread
> typedef struct
> { MPI_Comm communicator;
> char * path1;
> char * path2;
> char * port;
> } ThreadParam;
>
> pthread_t threads[10];
> int t =0;
> int rc;
>
>
> ThreadParam * listParam = NULL;
> MPI_Comm * newCommClient;
>
>
> /* server infinity loop */
> while (time_out > 10)
> {
>
> newCommClient = malloc(sizeof(MPI_Comm));
>
>
> /* waiting for a connection */
> MPI_Comm_accept(portMD, MPI_INFO_NULL, 0, MPI_COMM_WORLD,
> newCommClient);
>
> listParam = malloc(sizeof(ThreadParam));
> listParam->communicator = *newCommClient; //with this communicator
> the thread will talk with
> the
> /
> //client
> listParam->path1 = argv[2];
> listParam->path2 = argv[4];
> listParam->port = portMD;
>
> rc = pthread_create(&threads[t], NULL, threadfunc, (void *) listParam);
>
> if (rc){
> printf("ERROR; return code from pthread_create() is %d\n", rc);
> exit(-1);
> }
> //sleep(1);
> t++;
> }
>
> -----------------------------------------------------------------------------------------------------------------------------
> Please, I need help to solve this problem!!
>
> Thanks very mych!
>
> Gisele
More information about the mpich-discuss
mailing list