[mpich-discuss] stuck on MPI_Finalize and -mpilog

Tiago Silva tsilva at coas.oregonstate.edu
Wed Mar 19 20:28:55 CDT 2008


Actually I was making an error, sorry about that. I just got the barrier
to work.
In any case, the initial behavior still occurs: I get stuck at
mpi_finalize unless I compile one of the  programs with -mpilog.


Tiago Silva wrote:
> Thanks,
>
> I followed your suggestion and introduced
>   call MPI_Barrier(MPI_COMM_WORLD,ierror)
> on the three executables, before the finalization. Unfortunately when
> the first program reaches the barrier, the whole communicator crashes
> with a rather generic error:
> p0_30484:  p4_error: interrupt SIGSEGV: 11
> (Killed by signal 2 in stdout)
> Doesn't make any sense either... Any ideas?
>
> ** On the coexistence of mpich libraries compiled with different compilers:
> I did some testing at the time and couldn't get a second mpich2 version
> to work. I concluded at the time that there was a conflict between the
> 1st version's daemon's already running on the machine (can't recall the
> details although the installation is still around). Thus I had the idea
> of using mpich1 with the p4 device as it doesn't use the say daemons. Is
> there a way of making two mpich2 version to coexist?
>
> Cheers,
> Tiago
>
> Anthony Chan wrote:
>   
>> On Tue, 18 Mar 2008, tsilva at coas.oregonstate.edu wrote:
>>
>>     
>>> Hi all,
>>>
>>> I am working on a MPMD, and all the three programs were getting stuck
>>> on reaching MPI_Finalize. I can get it to work with minimalistic
>>> code, but when I build the complete components it gets stuck. Also,
>>> it seems to me that mpi_sends an mpi_recvs are well matched in my code.
>>> Not this is the gory part: when I compile the problematic program
>>> with -mpilog ithe job completes with no problems. I discovered this
>>> by chance and find it disquieting that (the lack of) a compiler
>>> option will break the code. Any ideas of could be happening here? I
>>> don't want the final code to use -mpilog
>>>       
>> You could have either a race condition (or even memory error) in your
>> program.  Try putiing MPI_Barrier before MPI_Finalize and recompile your
>> program (without -mpilog) to see if the program finishes normally.
>>
>>     
>>> some details:
>>> AMD
>>> Lahey Fortran 64bit Pro
>>> mpich 1.2.7p1 (p4 device)
>>>
>>> Why the mpich1? The cluster has mpich2 compiled with PGI. Because
>>> this compiler has a bug that conflicts with some of my code I was
>>> forced to use Lahey Fortran  for my project and found that I could
>>> create a local mpich1 and use mpirun without interfering with
>>> mpich2's daemon.
>>>       
>> Why don't you build your own version of mpich2 with Lahey Fortran
>> compiler?  mpich2 is more robust than mpich1, it may worth the
>> extra effort to get mpich2 working with your code.
>>
>> A.Chan
>>
>>     
>>> Cheers,
>>> Tiago
>>>
>>>
>>>       
>
>
>   


-- 
--
Tiago A. M. Silva
Postdoc Associate Researcher
College Of Oceanic and Atmospheric Sciences
Oregon State University
Burt 2-Room 426A
104 COAS Administration Building
Corvallis OR 97331-5503
USA
Phone: +1 541 737 5283
Fax:   +1 541 737 2064




More information about the mpich-discuss mailing list