[mpich-discuss] stuck on MPI_Finalize and -mpilog

Tiago Silva tsilva at coas.oregonstate.edu
Wed Mar 19 19:27:20 CDT 2008


Thanks,

I followed your suggestion and introduced
  call MPI_Barrier(MPI_COMM_WORLD,ierror)
on the three executables, before the finalization. Unfortunately when
the first program reaches the barrier, the whole communicator crashes
with a rather generic error:
p0_30484:  p4_error: interrupt SIGSEGV: 11
(Killed by signal 2 in stdout)
Doesn't make any sense either... Any ideas?

** On the coexistence of mpich libraries compiled with different compilers:
I did some testing at the time and couldn't get a second mpich2 version
to work. I concluded at the time that there was a conflict between the
1st version's daemon's already running on the machine (can't recall the
details although the installation is still around). Thus I had the idea
of using mpich1 with the p4 device as it doesn't use the say daemons. Is
there a way of making two mpich2 version to coexist?

Cheers,
Tiago

Anthony Chan wrote:
>
>
> On Tue, 18 Mar 2008, tsilva at coas.oregonstate.edu wrote:
>
>> Hi all,
>>
>> I am working on a MPMD, and all the three programs were getting stuck
>> on reaching MPI_Finalize. I can get it to work with minimalistic
>> code, but when I build the complete components it gets stuck. Also,
>> it seems to me that mpi_sends an mpi_recvs are well matched in my code.
>> Not this is the gory part: when I compile the problematic program
>> with -mpilog ithe job completes with no problems. I discovered this
>> by chance and find it disquieting that (the lack of) a compiler
>> option will break the code. Any ideas of could be happening here? I
>> don't want the final code to use -mpilog
>
> You could have either a race condition (or even memory error) in your
> program.  Try putiing MPI_Barrier before MPI_Finalize and recompile your
> program (without -mpilog) to see if the program finishes normally.
>
>>
>> some details:
>> AMD
>> Lahey Fortran 64bit Pro
>> mpich 1.2.7p1 (p4 device)
>>
>> Why the mpich1? The cluster has mpich2 compiled with PGI. Because
>> this compiler has a bug that conflicts with some of my code I was
>> forced to use Lahey Fortran  for my project and found that I could
>> create a local mpich1 and use mpirun without interfering with
>> mpich2's daemon.
>
> Why don't you build your own version of mpich2 with Lahey Fortran
> compiler?  mpich2 is more robust than mpich1, it may worth the
> extra effort to get mpich2 working with your code.
>
> A.Chan
>
>>
>> Cheers,
>> Tiago
>>
>>
>


-- 
--
Tiago A. M. Silva
Postdoc Associate Researcher
College Of Oceanic and Atmospheric Sciences
Oregon State University
Burt 2-Room 426A
104 COAS Administration Building
Corvallis OR 97331-5503
USA
Phone: +1 541 737 5283
Fax:   +1 541 737 2064




More information about the mpich-discuss mailing list