[mpich-discuss] [mvapich-discuss] How many Groups ?
Dave Goodell
goodell at mcs.anl.gov
Mon Feb 27 08:49:28 CST 2012
Do you actually need separate communicators so that you can invoke collective operations? Could you instead share the communicator among many/all threads and separate point-to-point communication by using unique tags for each thread?
-Dave
On Feb 27, 2012, at 8:31 AM CST, Lewis Alderton wrote:
> We were hoping to use more than 2000 communicators - on each node we want
> about 100 processes each running about 30 threads. Creating 3000
> communicators ( one per thread ) seemed to be the easiest way of doing this
> ( this way our job controller can broadcast a job to each thread ). Is
> there a better way of doing this ?
>
>
>
> From: Pavan Balaji <balaji at mcs.anl.gov>
> To: mpich-discuss at mcs.anl.gov
> Cc: Krishna Kandalla <kandalla at cse.ohio-state.edu>,
> mvapich-core at cse.ohio-state.edu, Lewis
> Alderton/Marlborough/IBM at IBMUS
> Date: 02/26/2012 11:25 AM
> Subject: Re: [mpich-discuss] [mvapich-discuss] How many Groups ?
>
>
>
>
> Why are you not free'ing the older communicators? Are you really
> looking for more than 2000 *active* communicators? The number of bits
> set aside for context IDs can be increased, but that will lose some
> internal optimizations within MPICH2 that are used when the source, tag
> and context ID all fit within 64 bits for queue searches.
> Alternatively, you can take away some number of bits from the tag space
> and give it to the context ID space.
>
> But this looks like a bad application that doesn't free its resources.
> Fixing the application seems much easier.
>
> -- Pavan
>
> On 02/25/2012 11:43 PM, Krishna Kandalla wrote:
>> Hi,
>> We recently received the following post and we seem to have the
>> same behavior with mpich2-1.5a1. We realize that this is because we are
>> running out
>> of context id's. Do you folks think it is feasible to increase the range
> of
>> allowable context id's?
>>
>>
>> The following code can be used to reproduce this behavior:
>>
>> ---------------------------------------------------------
>>
>> #include <mpi.h>
>> #include <stdlib.h>
>>
>> int main(int argc, char **argv)
>> {
>> int num_groups = 1000, my_rank, world_size, *group_members, i;
>> MPI_Group orig_group, *groups;
>> MPI_Init(&argc, &argv);
>> MPI_Comm *comms;
>>
>> if ( argc > 1 ) num_groups=atoi(argv[1]);
>>
>> MPI_Comm_rank( MPI_COMM_WORLD, &my_rank );
>> MPI_Comm_size(MPI_COMM_WORLD, &world_size);
>> MPI_Comm_group(MPI_COMM_WORLD, &orig_group);
>>
>> group_members = calloc(sizeof(int), world_size);
>> groups = calloc(sizeof(MPI_Group), num_groups);
>> comms = calloc(sizeof(MPI_Comm), num_groups);
>>
>> for ( i = 0; i < world_size; ++i ) group_members[i] = i;
>>
>> for ( i = 0; i < num_groups; ++i )
>> {
>> MPI_Group_incl(orig_group, world_size, group_members, &groups
> [i]);
>> MPI_Comm_create(MPI_COMM_WORLD, groups[i], &comms[i]);
>> }
>> }
>>
>> The observed error is :
>> PMPI_Comm_create(656).........:
>> MPI_Comm_create(MPI_COMM_WORLD, group=0xc80700f6, new_comm=0x1d1ecd4)
> failed
>> PMPI_Comm_create(611).........:
>> MPIR_Comm_create_intra(266)...:
>> MPIR_Get_contextid(554).......:
>> MPIR_Get_contextid_sparse(785): Too many communicators
>> [cli_0]: aborting job:
>>
>>
>>
>> Thanks,
>> Krishna
>>
>>
>> On Thu, Feb 23, 2012 at 1:32 PM, Lewis Alderton <lalderto at us.ibm.com
>> <mailto:lalderto at us.ibm.com>> wrote:
>>
>>
>> I'm using MPI_Group_incl to create many groups. There seems to be
>> limit of
>> 2048 groups - any way to increase this number ?
>>
>> ( I'm using mvapich2-1.8a1p1 )
>>
>> Thanks.
>>
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu
>> <mailto:mvapich-discuss at cse.ohio-state.edu>
>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>
>>
>>
>>
>> _______________________________________________
>> mpich-discuss mailing list mpich-discuss at mcs.anl.gov
>> To manage subscription options or unsubscribe:
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
> --
> Pavan Balaji
> http://www.mcs.anl.gov/~balaji
>
>
>
> _______________________________________________
> mpich-discuss mailing list mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
More information about the mpich-discuss
mailing list