[mpich-discuss] [mvapich-discuss] How many Groups ?

Dave Goodell goodell at mcs.anl.gov
Mon Feb 27 08:49:28 CST 2012


Do you actually need separate communicators so that you can invoke collective operations?  Could you instead share the communicator among many/all threads and separate point-to-point communication by using unique tags for each thread?

-Dave

On Feb 27, 2012, at 8:31 AM CST, Lewis Alderton wrote:

> We were hoping to use more than 2000 communicators - on each node we want
> about 100 processes each running about 30 threads. Creating 3000
> communicators ( one per thread ) seemed to be the easiest way of doing this
> ( this way our job controller can broadcast a job to each thread ). Is
> there a better way of doing this ?
> 
> 
> 
> From:	Pavan Balaji <balaji at mcs.anl.gov>
> To:	mpich-discuss at mcs.anl.gov
> Cc:	Krishna Kandalla <kandalla at cse.ohio-state.edu>,
>            mvapich-core at cse.ohio-state.edu, Lewis
>            Alderton/Marlborough/IBM at IBMUS
> Date:	02/26/2012 11:25 AM
> Subject:	Re: [mpich-discuss] [mvapich-discuss] How many Groups ?
> 
> 
> 
> 
> Why are you not free'ing the older communicators?  Are you really
> looking for more than 2000 *active* communicators?  The number of bits
> set aside for context IDs can be increased, but that will lose some
> internal optimizations within MPICH2 that are used when the source, tag
> and context ID all fit within 64 bits for queue searches.
> Alternatively, you can take away some number of bits from the tag space
> and give it to the context ID space.
> 
> But this looks like a bad application that doesn't free its resources.
> Fixing the application seems much easier.
> 
>  -- Pavan
> 
> On 02/25/2012 11:43 PM, Krishna Kandalla wrote:
>> Hi,
>>    We recently received the following post and we seem to have the
>> same behavior with mpich2-1.5a1. We realize that this is because we are
>> running out
>> of context id's. Do you folks think it is feasible to increase the range
> of
>> allowable context id's?
>> 
>> 
>>    The following code can be used to reproduce this behavior:
>> 
>> ---------------------------------------------------------
>> 
>> #include <mpi.h>
>> #include <stdlib.h>
>> 
>> int main(int argc, char **argv)
>> {
>>    int num_groups = 1000, my_rank, world_size, *group_members, i;
>>    MPI_Group orig_group, *groups;
>>    MPI_Init(&argc, &argv);
>>    MPI_Comm *comms;
>> 
>>    if ( argc > 1 ) num_groups=atoi(argv[1]);
>> 
>>    MPI_Comm_rank( MPI_COMM_WORLD, &my_rank );
>>    MPI_Comm_size(MPI_COMM_WORLD, &world_size);
>>    MPI_Comm_group(MPI_COMM_WORLD, &orig_group);
>> 
>>    group_members = calloc(sizeof(int), world_size);
>>    groups = calloc(sizeof(MPI_Group), num_groups);
>>    comms = calloc(sizeof(MPI_Comm), num_groups);
>> 
>>    for ( i = 0; i < world_size; ++i ) group_members[i] = i;
>> 
>>    for ( i = 0; i < num_groups; ++i )
>>    {
>>        MPI_Group_incl(orig_group, world_size, group_members, &groups
> [i]);
>>        MPI_Comm_create(MPI_COMM_WORLD, groups[i], &comms[i]);
>>    }
>> }
>> 
>> The observed error is :
>> PMPI_Comm_create(656).........:
>> MPI_Comm_create(MPI_COMM_WORLD, group=0xc80700f6, new_comm=0x1d1ecd4)
> failed
>> PMPI_Comm_create(611).........:
>> MPIR_Comm_create_intra(266)...:
>> MPIR_Get_contextid(554).......:
>> MPIR_Get_contextid_sparse(785): Too many communicators
>> [cli_0]: aborting job:
>> 
>> 
>> 
>> Thanks,
>> Krishna
>> 
>> 
>> On Thu, Feb 23, 2012 at 1:32 PM, Lewis Alderton <lalderto at us.ibm.com
>> <mailto:lalderto at us.ibm.com>> wrote:
>> 
>> 
>>    I'm using MPI_Group_incl to create many groups. There seems to be
>>    limit of
>>    2048 groups - any way to increase this number ?
>> 
>>    ( I'm using mvapich2-1.8a1p1 )
>> 
>>    Thanks.
>> 
>>    _______________________________________________
>>    mvapich-discuss mailing list
>>    mvapich-discuss at cse.ohio-state.edu
>>    <mailto:mvapich-discuss at cse.ohio-state.edu>
>>    http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>> 
>> 
>> 
>> 
>> _______________________________________________
>> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
>> To manage subscription options or unsubscribe:
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> 
> --
> Pavan Balaji
> http://www.mcs.anl.gov/~balaji
> 
> 
> 
> _______________________________________________
> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss



More information about the mpich-discuss mailing list