[mpich-discuss] Hydra, Runtime error

Evans Jeffrey jje at purdue.edu
Wed Jan 26 10:17:44 CST 2011


All,

> To my knowledge no one in the community that uses this program (Fire Dynamics Simulator, open source CFD tailored to fire, produced by community lead by the National Institute of Standards and Technology) has attempted to use hydra. They are still running on mpd.
[...]
I would not call myself one who is in the community, but I have successfully executed FDS on a linux cluster using MPICH2-1.1.1p1 with the  hydra process manager. We have upgraded to 2.1.3 but I have not had a chance to rerun FDS as yet. My experience has been using the FDS examples, I have not done anything more sophisticated. 

jje

Jeffrey J. Evans
jje at purdue.edu
http://web.ics.purdue.edu/~evans6/



On Jan 26, 2011, at 10:19 AM, Dave Goodell wrote:

> On Jan 26, 2011, at 7:44 AM CST, Paul Hart wrote:
> 
>> I have been using mpd as the process manager. I would like to change to hydra since mpd is being deprecated. I compiled MPICH2-1.3.1 and was able to run the cpi example program. I then attempted to run another program and receive the following error (ran in verbose mode for more info). I am able to run the same program using mpd.
>> 
>> To my knowledge no one in the community that uses this program (Fire Dynamics Simulator, open source CFD tailored to fire, produced by community lead by the National Institute of Standards and Technology) has attempted to use hydra. They are still running on mpd.
> [...]
>> Fatal error in PMPI_Gatherv: Internal MPI error!, error stack:
>> PMPI_Gatherv(376).....: MPI_Gatherv failed(sbuf=0x27a6c40, scount=1, MPI_DOUBLE_PRECISION, rbuf=0x27a6c40, rcnts=0x25b9670, displs=0x25b96f0, MPI_DOUBLE_PRECISION, root=0, MPI_COMM_WORLD) failed
>> MPIR_Gatherv_impl(189): 
>> MPIR_Gatherv(102).....: 
>> MPIR_Localcopy(346)...: memcpy arguments alias each other, dst=0x27a6c40 src=0x27a6c40 len=8
>> APPLICATION TERMINATED WITH THE EXIT STRING: Hangup (signal 1)
> 
> It doesn't look like the problem has anything to do with hydra.  Your program is passing dst==src to MPI_Gatherv, but MPI does not permit the send and recv buffers to alias each other.  We usually try to check for these sorts of things at a high level, but occasionally we miss the upper level check and this lower level check in MPIR_Localcopy triggers instead.  This check was added sometime after 1.2, IIRC, so you hit it because you upgraded not because of hydra.
> 
> The correct fix is to pass MPI_IN_PLACE as the value of sendbuf at the root process.
> 
> I'll put an error check in MPI_Gatherv in order to make the error a bit easier to understand.
> 
> -Dave
> 
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss



More information about the mpich-discuss mailing list