[mpich-discuss] stuck on MPI_Finalize and -mpilog

Anthony Chan chan at mcs.anl.gov
Wed Mar 19 20:50:04 CDT 2008


What kind of MPI calls did you use in your code ?

On Wed, 19 Mar 2008, Tiago Silva wrote:

> Actually I was making an error, sorry about that. I just got the barrier
> to work.
> In any case, the initial behavior still occurs: I get stuck at
> mpi_finalize unless I compile one of the  programs with -mpilog.
>
>
> Tiago Silva wrote:
>> Thanks,
>>
>> I followed your suggestion and introduced
>>   call MPI_Barrier(MPI_COMM_WORLD,ierror)
>> on the three executables, before the finalization. Unfortunately when
>> the first program reaches the barrier, the whole communicator crashes
>> with a rather generic error:
>> p0_30484:  p4_error: interrupt SIGSEGV: 11
>> (Killed by signal 2 in stdout)
>> Doesn't make any sense either... Any ideas?
>>
>> ** On the coexistence of mpich libraries compiled with different compilers:
>> I did some testing at the time and couldn't get a second mpich2 version
>> to work. I concluded at the time that there was a conflict between the
>> 1st version's daemon's already running on the machine (can't recall the
>> details although the installation is still around). Thus I had the idea
>> of using mpich1 with the p4 device as it doesn't use the say daemons. Is
>> there a way of making two mpich2 version to coexist?
>>
>> Cheers,
>> Tiago
>>
>> Anthony Chan wrote:
>>
>>> On Tue, 18 Mar 2008, tsilva at coas.oregonstate.edu wrote:
>>>
>>>
>>>> Hi all,
>>>>
>>>> I am working on a MPMD, and all the three programs were getting stuck
>>>> on reaching MPI_Finalize. I can get it to work with minimalistic
>>>> code, but when I build the complete components it gets stuck. Also,
>>>> it seems to me that mpi_sends an mpi_recvs are well matched in my code.
>>>> Not this is the gory part: when I compile the problematic program
>>>> with -mpilog ithe job completes with no problems. I discovered this
>>>> by chance and find it disquieting that (the lack of) a compiler
>>>> option will break the code. Any ideas of could be happening here? I
>>>> don't want the final code to use -mpilog
>>>>
>>> You could have either a race condition (or even memory error) in your
>>> program.  Try putiing MPI_Barrier before MPI_Finalize and recompile your
>>> program (without -mpilog) to see if the program finishes normally.
>>>
>>>
>>>> some details:
>>>> AMD
>>>> Lahey Fortran 64bit Pro
>>>> mpich 1.2.7p1 (p4 device)
>>>>
>>>> Why the mpich1? The cluster has mpich2 compiled with PGI. Because
>>>> this compiler has a bug that conflicts with some of my code I was
>>>> forced to use Lahey Fortran  for my project and found that I could
>>>> create a local mpich1 and use mpirun without interfering with
>>>> mpich2's daemon.
>>>>
>>> Why don't you build your own version of mpich2 with Lahey Fortran
>>> compiler?  mpich2 is more robust than mpich1, it may worth the
>>> extra effort to get mpich2 working with your code.
>>>
>>> A.Chan
>>>
>>>
>>>> Cheers,
>>>> Tiago
>>>>
>>>>
>>>>
>>
>>
>>
>
>
> -- 
> --
> Tiago A. M. Silva
> Postdoc Associate Researcher
> College Of Oceanic and Atmospheric Sciences
> Oregon State University
> Burt 2-Room 426A
> 104 COAS Administration Building
> Corvallis OR 97331-5503
> USA
> Phone: +1 541 737 5283
> Fax:   +1 541 737 2064
>
>




More information about the mpich-discuss mailing list