[mpich-discuss] Errors related to the increased number of tasks
Gustavo Correa
gus at ldeo.columbia.edu
Fri Dec 16 09:35:23 CST 2011
Hi Bernard
Am I mistaken, or does your main routine perhaps calls only
MPI_Init?
Your main seems to call only 'basicTest', but not 'rank',
where other MPI routines appear.
The MPICH2 developers may shed some light here,
but I think MPI_Init alone doesn't compose a minimal MPI program.
You need at least MPI_Finalize, I guess.
Or not?
Also, not related to your C program, but
since you are in Linux, why did you choose g77 to compile the Fortran-77 bindings,
and f95 [is this g95?] to compile the Fortran-90 bindings of MPICH2?
g77 is quite old, I have been luckier using gfortran to compile both
the Fortran 77 and 90 bindings.
I hope this helps,
Gus Correa
On Dec 16, 2011, at 9:49 AM, Bernard Chambon wrote:
> Hi,
>
> Le 15 déc. 2011 à 17:22, Bernard Chambon a écrit :
>
>> I'm still working on failures encountered as the number of tasks increases
>> (Using mpich2-1.4, compiled with gcc 4.1, on Scientific Linux 5 , 2.6.18-238.12cc.el5)
>>
>
> Other tests, on the same machine with mpich2 1.0 then 1.1, 1.2 etc
>
> >mpich2version
> MPICH2 Version: 1.0.8p1
> MPICH2 Release date: Unknown, built on Tue Apr 21 13:52:10 CEST 2009
> MPICH2 Device: ch3:sock
> MPICH2 configure: -prefix=/usr/local/mpich2
> MPICH2 CC: gcc -O2
> MPICH2 CXX: c++ -O2
> MPICH2 F77: g77 -O2
> MPICH2 F90: f95 -O2
>
> >mpicc -O2 -I $MPICH_HOME/include -L $MPICH_HOME/lib -o bin/basic_test basic_test.c
>
> >mpiexec -np 256 bin/basic_test
> Running 256 tasks
>
> >mpiexec -np 512 bin/basic_test
> Running 512 tasks
>
> >mpiexec -np 512 bin/basic_test
> Running 512 tasks
>
>
>
> With Mpich2 1.1 and beyond , I got error with around 150 tasks
> I probably ommitted something when compiling those versions , but I don't know where to look for
>
>
> >mpich2version
> MPICH2 Version: 1.1b1
> MPICH2 Release date: Unknown, built on Fri Dec 16 15:30:19 CET 2011
> MPICH2 Device: ch3:nemesis
> MPICH2 configure: --prefix=//scratch/BC/mpich2-1.1
> MPICH2 CC: /usr/bin/gcc -m64 -O2
> MPICH2 CXX: c++ -m64 -O2
> MPICH2 F77: /usr/bin/f77 -O2
> MPICH2 F90: f95 -O2
>
>
> >mpicc -O2 -I $MPICH_HOME/include -L $MPICH_HOME/lib -o bin/basic_test basic_test.c
> >mpiexec -np 100 bin/basic_test
> Running 100 tasks
>
> >mpiexec -np 120 bin/basic_test
> Running 120 tasks
>
> >mpiexec -np 150 bin/basic_test
> Assertion failed in file /scratch/BC/mpich2-1.1b1/src/util/wrappers/mpiu_shm_wrappers.h at line 919: seg_sz > 0
> internal ABORT - process 0
> rank 0 in job 26 ccwpge0001_56217 caused collective abort of all ranks
> exit status of rank 0: return code 1
>
>
> >mpich2version
> MPICH2 Version: 1.2.1
> MPICH2 Release date: Unknown, built on Fri Dec 16 13:40:20 CET 2011
> MPICH2 Device: ch3:nemesis
> MPICH2 configure: --prefix=//scratch/BC/mpich2-1.2
> MPICH2 CC: /usr/bin/gcc -m64 -O2
> MPICH2 CXX: c++ -m64 -O2
> MPICH2 F77: /usr/bin/f77 -O2
> MPICH2 F90: f95 -O2
>
> >mpicc -O2 -I $MPICH_HOME/include -L $MPICH_HOME/lib -o bin/basic_test basic_test.c
>
>
> >mpiexec -np 96 bin/basic_test
> Running 96 tasks
> >mpiexec -np 96 bin/basic_test
> Running 96 tasks
> >mpiexec -np 120 bin/basic_test
> Running 120 tasks
> >mpiexec -np 120 bin/basic_test
> Running 120 tasks
> >mpiexec -np 130 bin/basic_test
> Assertion failed in file /scratch/BC/mpich2-1.2.1/src/util/wrappers/mpiu_shm_wrappers.h at line 923: seg_sz > 0
> internal ABORT - process 0
> rank 0 in job 16 ccwpge0001_56217 caused collective abort of all ranks
> exit status of rank 0: return code 1
>
> Best regards
>
>
> PS :
> the test code
>
> int basicTest(int argc, char** argv) {
> if (MPI_Init(&argc, &argv) != MPI_SUCCESS ) {
> printf("Error calling MPI_Init !!, exiting \n") ; fflush(stdout);
> return(1);
> }
>
> int rank;
> if ( MPI_Comm_rank(MPI_COMM_WORLD, &rank)!= MPI_SUCCESS ) {
> printf("Error calling MPI_Comm_rank !!, exiting \n") ; fflush(stdout);
> MPI_Abort(MPI_COMM_WORLD, 1);
> return(1);
> }
>
> if (rank == 0) {
> int nprocs;
> if (MPI_Comm_size(MPI_COMM_WORLD, &nprocs)!= MPI_SUCCESS ) {
> printf("Error calling MPI_Comm_size !!, exiting \n") ; fflush(stdout);
> MPI_Abort(MPI_COMM_WORLD, 1);
> return(1);
> }
>
> printf("Running %d tasks \n", nprocs) ; fflush(stdout);
> MPI_Finalize();
> return(0);
> } else {
> sleep(1);
> MPI_Finalize(); // Necessaire ssi <= mpich2-1.2
> return(0);
> }
>
> }
> /******************************/
> int main(int argc, char** argv) {
> basicTest(argc, argv);
> }
>
>
> ---------------
> Bernard CHAMBON
> IN2P3 / CNRS
> 04 72 69 42 18
>
> _______________________________________________
> mpich-discuss mailing list mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
More information about the mpich-discuss
mailing list