[mpich-discuss] Errors related to the increased number of tasks

Gustavo Correa gus at ldeo.columbia.edu
Fri Dec 16 09:35:23 CST 2011


Hi Bernard

Am I mistaken, or does your main routine perhaps calls only 
MPI_Init?
Your main seems to call only 'basicTest', but not 'rank',
where  other MPI routines appear.

The MPICH2 developers may shed some light here,
but I think MPI_Init alone doesn't compose a minimal MPI program.
You need at least MPI_Finalize, I guess.
Or not?

Also, not related to your C program, but 
since you are in Linux, why did you choose g77 to compile the Fortran-77 bindings,
and f95 [is this g95?] to compile the Fortran-90 bindings of MPICH2?
g77 is quite old, I have been luckier using gfortran to compile both 
the Fortran 77 and 90 bindings.

I hope this helps,
Gus Correa

On Dec 16, 2011, at 9:49 AM, Bernard Chambon wrote:

> Hi,
> 
> Le 15 déc. 2011 à 17:22, Bernard Chambon a écrit :
> 
>> I'm still working on failures encountered as the number of tasks increases
>> (Using mpich2-1.4, compiled with gcc 4.1, on Scientific Linux 5 , 2.6.18-238.12cc.el5)
>> 
> 
> Other tests, on the same machine with mpich2 1.0 then 1.1, 1.2 etc
> 
>  >mpich2version
> MPICH2 Version:    	1.0.8p1
> MPICH2 Release date:	Unknown, built on Tue Apr 21 13:52:10 CEST 2009
> MPICH2 Device:    	ch3:sock
> MPICH2 configure: 	-prefix=/usr/local/mpich2
> MPICH2 CC: 	gcc  -O2
> MPICH2 CXX: 	c++  -O2
> MPICH2 F77: 	g77  -O2
> MPICH2 F90: 	f95  -O2
> 
>  >mpicc -O2 -I $MPICH_HOME/include -L $MPICH_HOME/lib -o bin/basic_test basic_test.c
> 
>  >mpiexec -np 256 bin/basic_test
> Running 256 tasks 
> 
>  >mpiexec -np 512 bin/basic_test
> Running 512 tasks 
> 
>  >mpiexec -np 512 bin/basic_test
> Running 512 tasks 
> 
> 
> 
> With Mpich2 1.1 and beyond , I got error with around 150 tasks
> I probably ommitted something when compiling those versions , but I don't know where to look for
> 
> 
>  >mpich2version 
> MPICH2 Version:    	1.1b1
> MPICH2 Release date:	Unknown, built on Fri Dec 16 15:30:19 CET 2011
> MPICH2 Device:    	ch3:nemesis
> MPICH2 configure: 	--prefix=//scratch/BC/mpich2-1.1
> MPICH2 CC: 	/usr/bin/gcc -m64 -O2
> MPICH2 CXX: 	c++ -m64 -O2
> MPICH2 F77: 	/usr/bin/f77  -O2
> MPICH2 F90: 	f95  -O2
> 
> 
>  >mpicc -O2 -I $MPICH_HOME/include -L $MPICH_HOME/lib -o bin/basic_test basic_test.c
>  >mpiexec -np 100 bin/basic_test
> Running 100 tasks 
> 
>  >mpiexec -np 120 bin/basic_test
> Running 120 tasks 
> 
>  >mpiexec -np 150 bin/basic_test
> Assertion failed in file /scratch/BC/mpich2-1.1b1/src/util/wrappers/mpiu_shm_wrappers.h at line 919: seg_sz > 0
> internal ABORT - process 0
> rank 0 in job 26  ccwpge0001_56217   caused collective abort of all ranks
>   exit status of rank 0: return code 1 
> 
> 
>  >mpich2version 
> MPICH2 Version:    	1.2.1
> MPICH2 Release date:	Unknown, built on Fri Dec 16 13:40:20 CET 2011
> MPICH2 Device:    	ch3:nemesis
> MPICH2 configure: 	--prefix=//scratch/BC/mpich2-1.2
> MPICH2 CC: 	/usr/bin/gcc -m64 -O2
> MPICH2 CXX: 	c++ -m64 -O2
> MPICH2 F77: 	/usr/bin/f77  -O2
> MPICH2 F90: 	f95  -O2
> 
>  >mpicc -O2 -I $MPICH_HOME/include -L $MPICH_HOME/lib -o bin/basic_test basic_test.c
> 
> 
>  >mpiexec -np 96 bin/basic_test
> Running 96 tasks 
>  >mpiexec -np 96 bin/basic_test
> Running 96 tasks 
>  >mpiexec -np 120 bin/basic_test
> Running 120 tasks 
>  >mpiexec -np 120 bin/basic_test
> Running 120 tasks 
>  >mpiexec -np 130 bin/basic_test
> Assertion failed in file /scratch/BC/mpich2-1.2.1/src/util/wrappers/mpiu_shm_wrappers.h at line 923: seg_sz > 0
> internal ABORT - process 0
> rank 0 in job 16  ccwpge0001_56217   caused collective abort of all ranks
>   exit status of rank 0: return code 1 
> 
> Best regards
> 
> 
> PS :
> the test code
> 
> int basicTest(int argc, char** argv) {
>  if (MPI_Init(&argc, &argv) != MPI_SUCCESS ) {
>   printf("Error calling MPI_Init !!, exiting \n") ; fflush(stdout);
>   return(1);
>  }
> 
>  int rank;
>  if ( MPI_Comm_rank(MPI_COMM_WORLD, &rank)!= MPI_SUCCESS ) {
>   printf("Error calling  MPI_Comm_rank !!, exiting \n") ; fflush(stdout);
>   MPI_Abort(MPI_COMM_WORLD, 1);
>   return(1);
>  }
>  
>  if (rank == 0) {
>   int nprocs;
>   if (MPI_Comm_size(MPI_COMM_WORLD, &nprocs)!= MPI_SUCCESS ) {
>    printf("Error calling  MPI_Comm_size !!, exiting \n") ; fflush(stdout);
>    MPI_Abort(MPI_COMM_WORLD, 1);
>    return(1);
>   }
>  
>   printf("Running %d tasks \n", nprocs) ; fflush(stdout);
>   MPI_Finalize(); 
>   return(0); 
>  } else {
>   sleep(1);
>   MPI_Finalize();  // Necessaire ssi <= mpich2-1.2
>   return(0);
>  }
> 
> }
> /******************************/
> int main(int argc, char** argv) {
>   basicTest(argc, argv);  
> }
> 
> 
> ---------------
> Bernard CHAMBON
> IN2P3 / CNRS
> 04 72 69 42 18
> 
> _______________________________________________
> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss



More information about the mpich-discuss mailing list