[MPICH] stuck in bcast

Martin Kleinschmidt mk at theochem.uni-duesseldorf.de
Fri Nov 10 03:58:42 CST 2006


On Fr, 27 Okt 2006, Geoff Jacobs wrote:

>Martin Kleinschmidt wrote:
>> On Do, 26 Okt 2006, Rajeev Thakur wrote:
>> 
>>> Is there enough memory allocated for the buffer? 
>> 
>>             if (myid .eq. 0) then
>>                allocate(adiag(nsaf), ediag(nsaf), stat = ierr)
>>                if (ierr .ne. 0) stop 'allocation error in sigdire'
>> 
>>                (calculate ediag)
>>             endif
>> 
>>             if (myid .ne. 0) then
>>                allocate(ediag(nsaf), stat = ierr)
>>                if (ierr .ne. 0) stop 'allocation error in sigdire'
>>             endif
>> c for testing if ediag is correctly allocated:
>>             do ia = 1, nsaf
>>                ediag(i) = i
>>             enddo
>> c  this is working up to nsaf=1495039 
>>             do ii = 1 490 001 , 1 500 001
>>                call MPI_bcast(ediag, ii,
>>      $              MPI_double_precision, 0, MPI_Comm_World, MPIerr)
>>                write(*,*)'bcast success, ii=',ii
>>             enddo
>> 
>>> If you can send us a small test program that demonstrates the error, it
>>> would be useful.
>> 
>> This is one of my problems: I have not yet been able to extract an
>> example out of my code, which still produces the error.
>> Maybe it depends on something, my code does before reaching the bcast,
>> but I can't imagine what it might be...
>
>Can you try this on a different compiler (I could help in this, if you
>don't have access otherwise)?

Sorry for not answering for so long ... my boss had me manage a lot of
other "urgent" problems :-(

I compiled mpich2 and my code with the lahey fortran compiler (and gcc):
the problem is the same but at another number, the last successful
broadcast is:

bcast success, ii= 1780221

in order to be verbose: my current configuration for mpich is:
##############
export CC=gcc
export CFLAGS="-O2 "
export F77=lf95
export FFLAGS="-O2 "
export F90=lf95
export F90FLAGS="-O2 "
export CXX=g++
export CXXFLAGS="-O2 "
export CPP=cpp
#export LDFLAGS=-static

./configure -prefix=/usr/local/encap/mpich2-1.0.4p1-lf -with-comm=shared --disable-devdebug --with-arch=LINUX
#############

in my first post, I've beeen using intel compilers with mpich conf:
############
export CC=icc
export CFLAGS="-O2 -w"
export F77=ifort
export FFLAGS="-O2 -w"
export F90=ifort
export F90FLAGS="-O2 -w"
export CXX=icc
export CXXFLAGS="-O2 -w"
export CPP=cpp
export LDFLAGS=-static

./configure -prefix=/usr/local/encap/mpich2-1.0.4p1-intel -with-comm=shared --disable-devdebug --with-arch=LINUX
############


so, I get a similar error in both cases with different fortran AND C
compilers.

I'll try again to reduce my code to something smaller which still
reproduces the error.

Martin




More information about the mpich-discuss mailing list