[MPICH2-dev] problem with multithreaded version of MPI_Comm_split() in mpich2-1.03

Ryzhykh, Alexey alexey.ryzhykh at intel.com
Mon Jun 26 12:19:03 CDT 2006


Rajeev,

I have checked your fix and both fixes together.

I have not seen any problems. I have run the test on single machine and
on two nodes for 4,5,6,7,8,16,24,32,64 threads.

 

Regards,

Alexey

 

________________________________

From: Rajeev Thakur [mailto:thakur at mcs.anl.gov] 
Sent: Monday, June 26, 2006 7:21 PM
To: Ryzhykh, Alexey; mpich2-dev at mcs.anl.gov
Cc: Voronov, German; Supalov, Alexander
Subject: RE: [MPICH2-dev] problem with multithreaded version of
MPI_Comm_split() in mpich2-1.03 

 

Alexey,

           I had found this problem, but applied the fix in a different
place (below)

 

        if (mask_in_use || comm_ptr->context_id > lowestContextId) { 

            memset( local_mask, 0, MAX_CONTEXT_MASK * sizeof(int) ); 
            own_mask        = 0;
<---------------------------- 
            if (comm_ptr->context_id < lowestContextId) { 
                lowestContextId = comm_ptr->context_id;  
            }      

 

But your test hangs with this fix for nthreads > 4 (on a single
machine).

 

I added your fix as well, but even then it hangs for nthreads > 4.

 

So I will have to look into it some more. Thanks for pointing out the
problem.

 

Rajeev

 

 

	
________________________________


	From: owner-mpich2-dev at mcs.anl.gov
[mailto:owner-mpich2-dev at mcs.anl.gov] On Behalf Of Ryzhykh, Alexey
	Sent: Monday, June 26, 2006 7:35 AM
	To: mpich2-dev at mcs.anl.gov
	Cc: Voronov, German; Supalov, Alexander
	Subject: [MPICH2-dev] problem with multithreaded version of
MPI_Comm_split() in mpich2-1.03 

	Hi,

	I am working at Intel Parallel System & Applications group

	I used mpich2-1.03 with sock device to run some MPI
multithreaded application.

	Mpich2-10.3 was built to support MPI_THREAD_MULTIPLE level.

	The application creates the communicator for each thread by
means of MPI_Comm_split() function.

	This communicator is needed in order to MPI messages on one
thread don't intersect with messages on other thread.

	But sometimes I see that the field context_id  is the same for
different thread communicators.

	I created the test (see enclosed file test.tgz) to demonstrate
the problem. But this test  uses internal mpich2-1.03  definitions to
check the value of context_id.

	Sometimes it fails as below:

	nthreads=32

	nthreads=32

	[0] context_id=184 i=6 j=12

	[1] context_id=184 i=6 j=12

	 test failed, number of errors is 1

	 test failed, number of errors is 1

	The failure shows that context_id is the same for different
communicators.

	 

	I have prepared the simple fix of this bug. See in attachment
enclosed file commutil.c.diff

	I believe that the bug is be fixed  after applying this patch to
the source file  mpich2-1.0.3/src/mpi/comm/commutil.c

	 

	With best regards,

	Alexey Ryzhykh,

	---

	Intel, Sarov

	 

	 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.mcs.anl.gov/mailman/private/mpich2-dev/attachments/20060626/73a030ba/attachment.htm>


More information about the mpich2-dev mailing list