[mpich-discuss] [mvapich-discuss] How many Groups ?

Lewis Alderton lalderto at us.ibm.com
Mon Feb 27 08:31:18 CST 2012


We were hoping to use more than 2000 communicators - on each node we want
about 100 processes each running about 30 threads. Creating 3000
communicators ( one per thread ) seemed to be the easiest way of doing this
( this way our job controller can broadcast a job to each thread ). Is
there a better way of doing this ?



From:	Pavan Balaji <balaji at mcs.anl.gov>
To:	mpich-discuss at mcs.anl.gov
Cc:	Krishna Kandalla <kandalla at cse.ohio-state.edu>,
            mvapich-core at cse.ohio-state.edu, Lewis
            Alderton/Marlborough/IBM at IBMUS
Date:	02/26/2012 11:25 AM
Subject:	Re: [mpich-discuss] [mvapich-discuss] How many Groups ?




Why are you not free'ing the older communicators?  Are you really
looking for more than 2000 *active* communicators?  The number of bits
set aside for context IDs can be increased, but that will lose some
internal optimizations within MPICH2 that are used when the source, tag
and context ID all fit within 64 bits for queue searches.
Alternatively, you can take away some number of bits from the tag space
and give it to the context ID space.

But this looks like a bad application that doesn't free its resources.
Fixing the application seems much easier.

  -- Pavan

On 02/25/2012 11:43 PM, Krishna Kandalla wrote:
> Hi,
>     We recently received the following post and we seem to have the
> same behavior with mpich2-1.5a1. We realize that this is because we are
> running out
> of context id's. Do you folks think it is feasible to increase the range
of
> allowable context id's?
>
>
>     The following code can be used to reproduce this behavior:
>
> ---------------------------------------------------------
>
> #include <mpi.h>
> #include <stdlib.h>
>
> int main(int argc, char **argv)
> {
>     int num_groups = 1000, my_rank, world_size, *group_members, i;
>     MPI_Group orig_group, *groups;
>     MPI_Init(&argc, &argv);
>     MPI_Comm *comms;
>
>     if ( argc > 1 ) num_groups=atoi(argv[1]);
>
>     MPI_Comm_rank( MPI_COMM_WORLD, &my_rank );
>     MPI_Comm_size(MPI_COMM_WORLD, &world_size);
>     MPI_Comm_group(MPI_COMM_WORLD, &orig_group);
>
>     group_members = calloc(sizeof(int), world_size);
>     groups = calloc(sizeof(MPI_Group), num_groups);
>     comms = calloc(sizeof(MPI_Comm), num_groups);
>
>     for ( i = 0; i < world_size; ++i ) group_members[i] = i;
>
>     for ( i = 0; i < num_groups; ++i )
>     {
>         MPI_Group_incl(orig_group, world_size, group_members, &groups
[i]);
>         MPI_Comm_create(MPI_COMM_WORLD, groups[i], &comms[i]);
>     }
> }
>
> The observed error is :
> PMPI_Comm_create(656).........:
> MPI_Comm_create(MPI_COMM_WORLD, group=0xc80700f6, new_comm=0x1d1ecd4)
failed
> PMPI_Comm_create(611).........:
> MPIR_Comm_create_intra(266)...:
> MPIR_Get_contextid(554).......:
> MPIR_Get_contextid_sparse(785): Too many communicators
> [cli_0]: aborting job:
>
>
>
> Thanks,
> Krishna
>
>
> On Thu, Feb 23, 2012 at 1:32 PM, Lewis Alderton <lalderto at us.ibm.com
> <mailto:lalderto at us.ibm.com>> wrote:
>
>
>     I'm using MPI_Group_incl to create many groups. There seems to be
>     limit of
>     2048 groups - any way to increase this number ?
>
>     ( I'm using mvapich2-1.8a1p1 )
>
>     Thanks.
>
>     _______________________________________________
>     mvapich-discuss mailing list
>     mvapich-discuss at cse.ohio-state.edu
>     <mailto:mvapich-discuss at cse.ohio-state.edu>
>     http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
>
>
> _______________________________________________
> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss

--
Pavan Balaji
http://www.mcs.anl.gov/~balaji





More information about the mpich-discuss mailing list