[MPICH2-dev] problem with multithreaded version of MPI_Comm_split() in mpich2-1.03
Rajeev Thakur
thakur at mcs.anl.gov
Mon Jun 26 12:22:33 CDT 2006
One difference is that I am running with the latest code in CVS. I replaced
your Comm_split with a Comm_dup (a simpler function), and even then it hangs
for nprocs=2 and nthreads=3.
Rajeev
_____
From: Ryzhykh, Alexey [mailto:alexey.ryzhykh at intel.com]
Sent: Monday, June 26, 2006 12:19 PM
To: Rajeev Thakur; mpich2-dev at mcs.anl.gov
Cc: Voronov, German; Supalov, Alexander
Subject: RE: [MPICH2-dev] problem with multithreaded version of
MPI_Comm_split() in mpich2-1.03
Rajeev,
I have checked your fix and both fixes together.
I have not seen any problems. I have run the test on single machine and on
two nodes for 4,5,6,7,8,16,24,32,64 threads.
Regards,
Alexey
_____
From: Rajeev Thakur [mailto:thakur at mcs.anl.gov]
Sent: Monday, June 26, 2006 7:21 PM
To: Ryzhykh, Alexey; mpich2-dev at mcs.anl.gov
Cc: Voronov, German; Supalov, Alexander
Subject: RE: [MPICH2-dev] problem with multithreaded version of
MPI_Comm_split() in mpich2-1.03
Alexey,
I had found this problem, but applied the fix in a different
place (below)
if (mask_in_use || comm_ptr->context_id > lowestContextId) {
memset( local_mask, 0, MAX_CONTEXT_MASK * sizeof(int) );
own_mask = 0; <----------------------------
if (comm_ptr->context_id < lowestContextId) {
lowestContextId = comm_ptr->context_id;
}
But your test hangs with this fix for nthreads > 4 (on a single machine).
I added your fix as well, but even then it hangs for nthreads > 4.
So I will have to look into it some more. Thanks for pointing out the
problem.
Rajeev
_____
From: owner-mpich2-dev at mcs.anl.gov [mailto:owner-mpich2-dev at mcs.anl.gov] On
Behalf Of Ryzhykh, Alexey
Sent: Monday, June 26, 2006 7:35 AM
To: mpich2-dev at mcs.anl.gov
Cc: Voronov, German; Supalov, Alexander
Subject: [MPICH2-dev] problem with multithreaded version of MPI_Comm_split()
in mpich2-1.03
Hi,
I am working at Intel Parallel System & Applications group
I used mpich2-1.03 with sock device to run some MPI multithreaded
application.
Mpich2-10.3 was built to support MPI_THREAD_MULTIPLE level.
The application creates the communicator for each thread by means of
MPI_Comm_split() function.
This communicator is needed in order to MPI messages on one thread don't
intersect with messages on other thread.
But sometimes I see that the field context_id is the same for different
thread communicators.
I created the test (see enclosed file test.tgz) to demonstrate the problem.
But this test uses internal mpich2-1.03 definitions to check the value of
context_id.
Sometimes it fails as below:
nthreads=32
nthreads=32
[0] context_id=184 i=6 j=12
[1] context_id=184 i=6 j=12
test failed, number of errors is 1
test failed, number of errors is 1
The failure shows that context_id is the same for different communicators.
I have prepared the simple fix of this bug. See in attachment enclosed file
commutil.c.diff
I believe that the bug is be fixed after applying this patch to the source
file mpich2-1.0.3/src/mpi/comm/commutil.c
With best regards,
Alexey Ryzhykh,
---
Intel, Sarov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.mcs.anl.gov/mailman/private/mpich2-dev/attachments/20060626/dcf555ae/attachment.htm>
More information about the mpich2-dev
mailing list