[mpich-discuss] stuck on MPI_Finalize and -mpilog

Tiago Silva tsilva at coas.oregonstate.edu
Thu Mar 20 14:37:01 CDT 2008


Anthony Chan wrote:
> We have multiple versions of mpich's coexist on the same machine.
> You just need to use absolute path to invoke the mpicc/mpif90/mpiexec...

The issue is having a local version of mpich2. Just to confirm, mpich2's
mpiexec must use mpd or smpd running in every node, right? Would it be
possible for me to do this as a user?
The other problem is that I recall that I had trouble building mpich2
with lf90.
>
> What kind of MPI calls did you use in your code ?
>
I am using several libraries (mpp_io, psmile) that use a multitude of
mpi_calls. I think I am using MPI_BSend as the standard MPI_Send.

MPI_INITIALIZED
MPI_INIT
MPI_COMM_RANK
MPI_COMM_SIZE
MPI_COMM_GROUP
MPI_FINALIZE
MPI_GROUP_INCL
MPI_COMM_CREATE
MPI_BARRIER
MPI_ABORT
MPI_Recv
MPI_Send
MPI_ISend
MPI_BSend
MPI_Pack/UNPack
MPI_Wait
MPI_Allgather
MPI_Buffer_Attach/Detach

>
>> Actually I was making an error, sorry about that. I just got the barrier
>> to work.
>> In any case, the initial behavior still occurs: I get stuck at
>> mpi_finalize unless I compile one of the programs with -mpilog.
>>
>>
>> Tiago Silva wrote:
>>> Thanks,
>>>
>>> I followed your suggestion and introduced
>>> call MPI_Barrier(MPI_COMM_WORLD,ierror)
>>> on the three executables, before the finalization. Unfortunately when
>>> the first program reaches the barrier, the whole communicator crashes
>>> with a rather generic error:
>>> p0_30484: p4_error: interrupt SIGSEGV: 11
>>> (Killed by signal 2 in stdout)
>>> Doesn't make any sense either... Any ideas?
>>>
>>> ** On the coexistence of mpich libraries compiled with different
>>> compilers:
>>> I did some testing at the time and couldn't get a second mpich2 version
>>> to work. I concluded at the time that there was a conflict between the
>>> 1st version's daemon's already running on the machine (can't recall the
>>> details although the installation is still around). Thus I had the idea
>>> of using mpich1 with the p4 device as it doesn't use the say
>>> daemons. Is
>>> there a way of making two mpich2 version to coexist?
>>>
>>> Cheers,
>>> Tiago
>>>
>>> Anthony Chan wrote:
>>>
>>>> On Tue, 18 Mar 2008, tsilva at coas.oregonstate.edu wrote:
>>>>
>>>>
>>>>> Hi all,
>>>>>
>>>>> I am working on a MPMD, and all the three programs were getting stuck
>>>>> on reaching MPI_Finalize. I can get it to work with minimalistic
>>>>> code, but when I build the complete components it gets stuck. Also,
>>>>> it seems to me that mpi_sends an mpi_recvs are well matched in my
>>>>> code.
>>>>> Not this is the gory part: when I compile the problematic program
>>>>> with -mpilog ithe job completes with no problems. I discovered this
>>>>> by chance and find it disquieting that (the lack of) a compiler
>>>>> option will break the code. Any ideas of could be happening here? I
>>>>> don't want the final code to use -mpilog
>>>>>
>>>> You could have either a race condition (or even memory error) in your
>>>> program. Try putiing MPI_Barrier before MPI_Finalize and recompile
>>>> your
>>>> program (without -mpilog) to see if the program finishes normally.
>>>>
>>>>
>>>>> some details:
>>>>> AMD
>>>>> Lahey Fortran 64bit Pro
>>>>> mpich 1.2.7p1 (p4 device)
>>>>>
>>>>> Why the mpich1? The cluster has mpich2 compiled with PGI. Because
>>>>> this compiler has a bug that conflicts with some of my code I was
>>>>> forced to use Lahey Fortran for my project and found that I could
>>>>> create a local mpich1 and use mpirun without interfering with
>>>>> mpich2's daemon.
>>>>>
>>>> Why don't you build your own version of mpich2 with Lahey Fortran
>>>> compiler? mpich2 is more robust than mpich1, it may worth the
>>>> extra effort to get mpich2 working with your code.
>>>>
>>>> A.Chan
>>>>
>>>>
>>>>> Cheers,
>>>>> Tiago
>>>>>
>>>>>
>>>>>
>>>
>>>
>>>
>>
>>
>> -- 
>> -- 
>> Tiago A. M. Silva
>> Postdoc Associate Researcher
>> College Of Oceanic and Atmospheric Sciences
>> Oregon State University
>> Burt 2-Room 426A
>> 104 COAS Administration Building
>> Corvallis OR 97331-5503
>> USA
>> Phone: +1 541 737 5283
>> Fax: +1 541 737 2064
>>
>>
>


-- 
--
Tiago A. M. Silva
Postdoc Associate Researcher
College Of Oceanic and Atmospheric Sciences
Oregon State University
Burt 2-Room 426A
104 COAS Administration Building
Corvallis OR 97331-5503
USA
Phone: +1 541 737 5283
Fax:   +1 541 737 2064




More information about the mpich-discuss mailing list