[mpich-discuss] stuck on MPI_Finalize and -mpilog

Anthony Chan chan at mcs.anl.gov
Wed Mar 19 20:39:54 CDT 2008



On Wed, 19 Mar 2008, Tiago Silva wrote:

> Thanks,
>
> I followed your suggestion and introduced
>  call MPI_Barrier(MPI_COMM_WORLD,ierror)
> on the three executables, before the finalization. Unfortunately when
> the first program reaches the barrier, the whole communicator crashes
> with a rather generic error:
> p0_30484:  p4_error: interrupt SIGSEGV: 11
> (Killed by signal 2 in stdout)
> Doesn't make any sense either... Any ideas?

"SIGSEGV: 11" suggests that your code is accessing invalid memory.

>
> ** On the coexistence of mpich libraries compiled with different compilers:
> I did some testing at the time and couldn't get a second mpich2 version
> to work. I concluded at the time that there was a conflict between the
> 1st version's daemon's already running on the machine (can't recall the
> details although the installation is still around). Thus I had the idea
> of using mpich1 with the p4 device as it doesn't use the say daemons. Is
> there a way of making two mpich2 version to coexist?

We have multiple versions of mpich's coexist on the same machine.
You just need to use absolute path to invoke the 
mpicc/mpif90/mpiexec...

A.Chan
>
> Cheers,
> Tiago
>
> Anthony Chan wrote:
>>
>>
>> On Tue, 18 Mar 2008, tsilva at coas.oregonstate.edu wrote:
>>
>>> Hi all,
>>>
>>> I am working on a MPMD, and all the three programs were getting stuck
>>> on reaching MPI_Finalize. I can get it to work with minimalistic
>>> code, but when I build the complete components it gets stuck. Also,
>>> it seems to me that mpi_sends an mpi_recvs are well matched in my code.
>>> Not this is the gory part: when I compile the problematic program
>>> with -mpilog ithe job completes with no problems. I discovered this
>>> by chance and find it disquieting that (the lack of) a compiler
>>> option will break the code. Any ideas of could be happening here? I
>>> don't want the final code to use -mpilog
>>
>> You could have either a race condition (or even memory error) in your
>> program.  Try putiing MPI_Barrier before MPI_Finalize and recompile your
>> program (without -mpilog) to see if the program finishes normally.
>>
>>>
>>> some details:
>>> AMD
>>> Lahey Fortran 64bit Pro
>>> mpich 1.2.7p1 (p4 device)
>>>
>>> Why the mpich1? The cluster has mpich2 compiled with PGI. Because
>>> this compiler has a bug that conflicts with some of my code I was
>>> forced to use Lahey Fortran  for my project and found that I could
>>> create a local mpich1 and use mpirun without interfering with
>>> mpich2's daemon.
>>
>> Why don't you build your own version of mpich2 with Lahey Fortran
>> compiler?  mpich2 is more robust than mpich1, it may worth the
>> extra effort to get mpich2 working with your code.
>>
>> A.Chan
>>
>>>
>>> Cheers,
>>> Tiago
>>>
>>>
>>
>
>
> -- 
> --
> Tiago A. M. Silva
> Postdoc Associate Researcher
> College Of Oceanic and Atmospheric Sciences
> Oregon State University
> Burt 2-Room 426A
> 104 COAS Administration Building
> Corvallis OR 97331-5503
> USA
> Phone: +1 541 737 5283
> Fax:   +1 541 737 2064
>
>




More information about the mpich-discuss mailing list