[mpich-discuss] [mvapich-discuss] How many Groups ?

Pavan Balaji balaji at mcs.anl.gov
Mon Feb 27 11:02:24 CST 2012


Also, from your explanation, it seems like each process only has 30 
active communicators, not 3000.  It might be better if you can provide 
us the actual example of what you are doing.  The simple code sent by 
Krishna is just dup'ing comm_world; if you run it with 3000 iterations 
of that, each process will have 3000 communicators which is different 
from your application.

  -- Pavan

On 02/27/2012 09:23 AM, Lewis Alderton wrote:
> Hi Dave - Yes we can share communicators and are going to build a test
> implementation of this model
> Thanks.
>
>
>
> From:	Dave Goodell<goodell at mcs.anl.gov>
> To:	mpich-discuss at mcs.anl.gov
> Cc:	mvapich-core at cse.ohio-state.edu
> Date:	02/27/2012 09:50 AM
> Subject:	Re: [mpich-discuss] [mvapich-discuss] How many Groups ?
> Sent by:	mpich-discuss-bounces at mcs.anl.gov
>
>
>
> Do you actually need separate communicators so that you can invoke
> collective operations?  Could you instead share the communicator among
> many/all threads and separate point-to-point communication by using unique
> tags for each thread?
>
> -Dave
>
> On Feb 27, 2012, at 8:31 AM CST, Lewis Alderton wrote:
>
>> We were hoping to use more than 2000 communicators - on each node we want
>> about 100 processes each running about 30 threads. Creating 3000
>> communicators ( one per thread ) seemed to be the easiest way of doing
> this
>> ( this way our job controller can broadcast a job to each thread ). Is
>> there a better way of doing this ?
>>
>>
>>
>> From:		 Pavan Balaji<balaji at mcs.anl.gov>
>> To:		 mpich-discuss at mcs.anl.gov
>> Cc:		 Krishna Kandalla<kandalla at cse.ohio-state.edu>,
>>             mvapich-core at cse.ohio-state.edu, Lewis
>>             Alderton/Marlborough/IBM at IBMUS
>> Date:		 02/26/2012 11:25 AM
>> Subject:		 Re: [mpich-discuss] [mvapich-discuss] How many Groups ?
>>
>>
>>
>>
>> Why are you not free'ing the older communicators?  Are you really
>> looking for more than 2000 *active* communicators?  The number of bits
>> set aside for context IDs can be increased, but that will lose some
>> internal optimizations within MPICH2 that are used when the source, tag
>> and context ID all fit within 64 bits for queue searches.
>> Alternatively, you can take away some number of bits from the tag space
>> and give it to the context ID space.
>>
>> But this looks like a bad application that doesn't free its resources.
>> Fixing the application seems much easier.
>>
>>   -- Pavan
>>
>> On 02/25/2012 11:43 PM, Krishna Kandalla wrote:
>>> Hi,
>>>     We recently received the following post and we seem to have the
>>> same behavior with mpich2-1.5a1. We realize that this is because we are
>>> running out
>>> of context id's. Do you folks think it is feasible to increase the range
>> of
>>> allowable context id's?
>>>
>>>
>>>     The following code can be used to reproduce this behavior:
>>>
>>> ---------------------------------------------------------
>>>
>>> #include<mpi.h>
>>> #include<stdlib.h>
>>>
>>> int main(int argc, char **argv)
>>> {
>>>     int num_groups = 1000, my_rank, world_size, *group_members, i;
>>>     MPI_Group orig_group, *groups;
>>>     MPI_Init(&argc,&argv);
>>>     MPI_Comm *comms;
>>>
>>>     if ( argc>  1 ) num_groups=atoi(argv[1]);
>>>
>>>     MPI_Comm_rank( MPI_COMM_WORLD,&my_rank );
>>>     MPI_Comm_size(MPI_COMM_WORLD,&world_size);
>>>     MPI_Comm_group(MPI_COMM_WORLD,&orig_group);
>>>
>>>     group_members = calloc(sizeof(int), world_size);
>>>     groups = calloc(sizeof(MPI_Group), num_groups);
>>>     comms = calloc(sizeof(MPI_Comm), num_groups);
>>>
>>>     for ( i = 0; i<  world_size; ++i ) group_members[i] = i;
>>>
>>>     for ( i = 0; i<  num_groups; ++i )
>>>     {
>>>         MPI_Group_incl(orig_group, world_size, group_members,&groups
>> [i]);
>>>         MPI_Comm_create(MPI_COMM_WORLD, groups[i],&comms[i]);
>>>     }
>>> }
>>>
>>> The observed error is :
>>> PMPI_Comm_create(656).........:
>>> MPI_Comm_create(MPI_COMM_WORLD, group=0xc80700f6, new_comm=0x1d1ecd4)
>> failed
>>> PMPI_Comm_create(611).........:
>>> MPIR_Comm_create_intra(266)...:
>>> MPIR_Get_contextid(554).......:
>>> MPIR_Get_contextid_sparse(785): Too many communicators
>>> [cli_0]: aborting job:
>>>
>>>
>>>
>>> Thanks,
>>> Krishna
>>>
>>>
>>> On Thu, Feb 23, 2012 at 1:32 PM, Lewis Alderton<lalderto at us.ibm.com
>>> <mailto:lalderto at us.ibm.com>>  wrote:
>>>
>>>
>>>     I'm using MPI_Group_incl to create many groups. There seems to be
>>>     limit of
>>>     2048 groups - any way to increase this number ?
>>>
>>>     ( I'm using mvapich2-1.8a1p1 )
>>>
>>>     Thanks.
>>>
>>>     _______________________________________________
>>>     mvapich-discuss mailing list
>>>     mvapich-discuss at cse.ohio-state.edu
>>>     <mailto:mvapich-discuss at cse.ohio-state.edu>
>>>     http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
>>> To manage subscription options or unsubscribe:
>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>
>> --
>> Pavan Balaji
>> http://www.mcs.anl.gov/~balaji
>>
>>
>>
>> _______________________________________________
>> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
>> To manage subscription options or unsubscribe:
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
> _______________________________________________
> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
>
>
> _______________________________________________
> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji


More information about the mpich-discuss mailing list